...making Linux just a little more fun!
[ In reference to "Introduction to Shell Scripting, part 6" in LG#116 ]
Mr Dash Four [mr.dash.four at googlemail.com]
Hi, Ben,
> Please don't drop TAG from the CC list; we all get "paid" for our time > by contributing our technical expertise to the Linux community, and that > can't happen in a private email exchange.Apologies, it wasn't intentional - I've got your message and hit the 'reply' button without realising there was a cc: as well.
>> Here is my plan (funny enough I was about to start doing this in about 2 >> hours and possibly spend the whole of tomorrow - Sunday - depending in >> what kind of a mess I may end up in): >> >> 1. Backup my entire current /boot partition (it is about 52MiB). >> 2. Restore a month-old backup of this /boot partition to a safe' location >> (USB drive). As this backup is old apart from the new kernel version it >> won't contain anything wrong with the partition and my first task will be >> to compare the files, which may cause my partition not to boot (menu.lst >> etc) as well as the boot sector. I would expect to see changes and will >> ignore the ones caused by the kernel updates (like new versions of the >> vmlinuz- file). >> 3. If I find such changes between the 'old' backup and the new one, which >> prevent me from booting up the new partition then I will reverse them and >> see if I can boot up.The first two steps went off without a hitch. To my big surprise I did NOT find any significant changes to the /boot partition. I've attached 'files.zip' which contains a few interesting files in it. 'boot_list.txt' and 'boot_grub_list.txt' lists the contents of my '/' and '/grub' directories of the '/boot' partition. The only difference between the old (backed up) and the new was file size and different version numbers of the kernel files (vmlinuz, initrd and the like).
>> 4. If there are NO changes I could find (the least favourable option for >> me as I will enter uncharted waters here!) then I would have no option, >> but to run grub-install /dev/sda while within FC8 Live CD to restore GRUB >> in the hope of getting GRUB to load. If I could then boot normally from >> the hard disk then I would compare what has been done (both in terms of >> files and the boot sector - bot on the /boot partition as well as the >> absolute on /dev/sda) and see if I can find any differences. If not, well >> ... it will remain a great mystery what really went wrong, sadly! >>Well, this was my almost last-chance saloon and given that 'grub-install' messed up my mbr completely (read below) I was VERY reluctant to use it again to re-install GRUB, so after re-reading the original article on how to re-install it (refer to my 1st email) coupled with me getting nowhere with managing to boot from the CD via the GRUB menu (instead of typing each and every grub command) I did the following:
1. Booted up from bare-bones CD containing only grub/grub.conf, grub/menu.lst and grub/stage2_eltorito files (that, in addition to the standard ones included when I made the bootable iso image).
2. When I've got the grub menu I pressed 'c' to get to the grub command prompt.
3. root (hd0,6) - indicating to grub where my '/boot' partition is.
4. setup (hd0) - setting up and installing GRUB to my mbr
That's it! I have to say that by issuing the last 'setup' command I finally realised where the problem was. This is the output I've got from grub:
Checking if "/boot/grub/stage1" exists...no Checking if "/grub/stage1" exists...yes Checking if "/grub/stage2" exists...yes Checking if "/grub/e2fs_stage1_5" exists...yes <--*1 Running "embed /grub/e2fs_stage1_5 (hd0)"...24 sectors are embedded succeeded. <--*2 Running "install /grub/stage1 (hd0) (hd0)1+24 p (hd0,6)/grub/stage2 /grub/grub.conf"...succeeded <--*3 Done.
*1 - As soon as I saw that the light bulb was well and truly on! Stage1_5 embeds after the mbr and uses area of the disk, which is ... well...unused and places additional code of the boot-up routine there (this is hd0, NOT the boot-up sector of my /boot partition - depending on the disk geometry the first side, apart from sector 0 is never used! in my configuration I have 63 sectors to play with - that in addition to sector 0 of hd0).
*2 - this is where it must have gone wrong. I've made a copy of my mbr (sector 0 of my HDD) completely forgetting that there is additional part, which resides after the mbr (stage 1.5) and when I did 'grub-install /dev/sdb' (my floppy is USB and FC8 maps it as /dev/sdb) with the good intention of installing grub onto my floppy, the daft script thought I want to install stage1_5 for a floppy on my (current) hard disk device and messed that part of the boot-up completely. Since I had a backup of the mbr only (sector 0, but not 1-24 as is in this case) I could not have known that the intermediate stage could have been messed up.
*3 - That is where the link between all the devices comes in - I have no way of recreating this on a CD (hence I suspect why I am getting Error 25 when using the same grub.conf file from my HDD onto the CD - I have no other explanation).
After I've done all that and rebooted ... voila - got the GRUB menu and booted up as normal. The difference this time (compared to previous boot-ups from the HDD) was that the boot-up process is now more verbose - I see more messages about stage1_5 loading and when the kernel and initrd files are loaded - none of which I have seen before. I think this is because the stage1_5 and stage2 files differ (in size as well as functionality) with different kernel updates.
In conclusion - grub-install is tricky to work with (to put it mildly) - it can seriously mess up your hard disk (it must come with a MASSIVE, RED health warning!!!). The cleanest way of managing grub in my view is through the grub itself when booted from a CD or other device, NOT the hard disk.
>>>> title Fedora (2.6.25.4-10.fc8) >>>> root (hd0,6) >>>> kernel /vmlinuz-2.6.25.4-10.fc8 ro root=/dev/VolGroup00/LogVol00 rhgb >>>> >>> Does this kernel actually exist? How about the device? Does the latter >>> exist before your auto-dev-creation daemons come on line? >>> >> Don't know what you mean here, but (hd0) is my hard disk and as such >> should be visible. >> > > I meant the device specified as 'root' in your GRUB stanza - > '/dev/VolGroup00/LogVol00'.It does exist, because one of my logical partitions is split into 4 volumes (LogVol00 is the root directory in this case). This is actually automatically resolved by the kernel itself. Very nice! The reason I use it this way is because i can shrink/expand my 'volumes' at a fly without the need to shrink/expand my partitions and reboot after each operation.
>> Anyway, when I type this (at the grub> prompt) it >> works, so I presume there is nothing wrong with it. Still don't understand >> why grub is treating what I type (and which subsequently works!) >> differently compared to when I press return after the option which >> contains the same statements is selected? >> > > That's precisely why I asked that question. This is somewhat unlikely, > but - what would happen if that device 1) didn't normally exist and 2) > was created by GRUB loading some device-creation module/daemon? It seems > to me that you'd see exactly the kind of behavior that you're > describing.Indeed, I first thought that it may be something wrong with the disk itself, but when I fixed the boot-up problem I tried again - sadly with the same effect (Error 25 etc). I still think this is due to the stage1_5 missing from the CD, but I am not sure. Still baffled though - I have absolutely no explanation - how is it possible for me to boot up when I type everything manually, but get an error when I press an option, which contains exactly the same line of statements? A mystery indeed!
One last thing - I tried a dozen different variations of the iso images (and wasted about 14 CDs!!!) - with/without including device.map (see included in files.zip), with menu.lst+grub.conf or with grub.conf (also included in files.zip), but without menu.lst etc etc etc ad nauseum - none of which worked. I even replaced the 'kernel /vmlinuz-xx-xx-xx-xx....' line with 'kernel (hd0)/vmlinuz-xx....' - nada, same result - Error 25! Hitting a brick wall comes to mind!
>> P.S. I just noticed a few interesting things when looked at my boot sector >> files with a BinHex viewer (Windows) - the boot sector of the /boot >> partition is all zeroed (both in the old - working - and new/current - >> non-working versions). >> > > By "boot sector", I assume you mean /boot/grub/stage1, right? That is > pretty odd.No! I meant the first sector of my /boot partition - stage1 is placed in the mbr in modified form (see note *2 and *3 above). After I fixed my boot-up problem I have made a backup of the mbr again to compare it with what I have had previously - there are 6 changes (see files 'hda-mbr-nopart' and 'hda-mbr-nopart_old' in the attached files.zip), so something must have gone south!
>> Also, in the boot CD the file I thought I copied as >> /BOOT/GRUB/GRUB.CONF is saved as /BOOT/GRUB/GRUB.CON ('F' is missing). >> > > I suspect that it has to do with the DOS 8+3 file naming scheme. Yep - > we're still stuck with that, given that the whole mess with booting CDs > still has those roots... In any case, there's usually some sort of a map > file that keeps track of the actual file names.You are right - I looked at the CD when in Windows.
Tomorrow I will tackle my floppy-boot-up to USB drive problem/challenge. Will let you all know what happened.
George
Kat Tanaka Okopnik [kat at linuxgazette.net]
[Hi - an editorial comment, here. Please don't send .html e-mail to TAG. There's a little button or tab on your gmail screen that will send "plain text" e-mail, instead. Sent to the list as a reminder to everyone. -- Kat]
On Sun, Jul 06, 2008 at 02:00:12AM +0100, Mr Dash Four wrote:
[a bunch of stuff elided]
Thanks!
-- Kat Tanaka Okopnik Linux Gazette Mailbag Editor kat@linuxgazette.net
Ben Okopnik [ben at linuxgazette.net]
On Sun, Jul 06, 2008 at 02:00:12AM +0100, Mr Dash Four wrote:
> > That's it! I have to say that by issuing the last 'setup' command I > finally realised where the problem was. This is the output I've got from > grub: > > Checking if "/boot/grub/stage1" exists...no > Checking if "/grub/stage1" exists...yes > Checking if "/grub/stage2" exists...yes > Checking if "/grub/e2fs_stage1_5" exists...yes <--*1 > Running "embed /grub/e2fs_stage1_5 (hd0)"...24 sectors are embedded > succeeded. <--*2 > Running "install /grub/stage1 (hd0) (hd0)1+24 p (hd0,6)/grub/stage2 > /grub/grub.conf"...succeeded <--*3 > Done. > > *1 - As soon as I saw that the light bulb was well and truly on! Stage1_5 > embeds after the mbr and uses area of the disk, which is ... > well...unused and places additional code of the boot-up routine there > (this is hd0, NOT the boot-up sector of my /boot partition - depending on > the disk geometry the first side, apart from sector 0 is never used!
Well done! I often find, with complex multi-stage problems like this one, that I need to "stew in it" for a while - fiddle with stuff (always keeping careful backups) and see how it breaks when I change things. After a while, I get an idea of how the thing actually works - which is often completely irrelevant to the way that the docs describe it and sometimes even contrary to, or at least different from the intent of the programmer who wrote the thing.
Unfortunately, it's rather hard to document that type of insight or mindset - and it's easy to forget, at least for a given system. I had to do this with, e.g., LILO a couple of times. On the other hand, you always gain something from the process - knowledge about that general type of thing, understanding of better troubleshooting methods, practice in rational thinking - so it's never a total loss.
> After I've done all that and rebooted ... voila - got the GRUB menu and > booted up as normal. The difference this time (compared to previous > boot-ups from the HDD) was that the boot-up process is now more verbose - > I see more messages about stage1_5 loading and when the kernel and initrd > files are loaded - none of which I have seen before. I think this is > because the stage1_5 and stage2 files differ (in size as well as > functionality) with different kernel updates.
You do have the 'quiet' option specified on the appropriate 'kernel' lines in menu.lst, right?
> In conclusion - grub-install is tricky to work with (to put it mildly) - > it can seriously mess up your hard disk (it must come with a MASSIVE, RED > health warning!!!).
Frankly, I've always approached it the way you have - with extreme caution. I understand the boot process quite well, and I have a general good understanding of what GRUB does to mediate it, but the "how" of it is still a bit of a dark mystery. I suspect that I've just been lucky, and one day soon, I'll have to dive into it and explore the details.
> Indeed, I first thought that it may be something wrong with the disk > itself, but when I fixed the boot-up problem I tried again - sadly with > the same effect (Error 25 etc). I still think this is due to the stage1_5 > missing from the CD, but I am not sure. Still baffled though - I have > absolutely no explanation - how is it possible for me to boot up when I > type everything manually, but get an error when I press an option, which > contains exactly the same line of statements? A mystery indeed!
"Timing" is the only answer I can think of. Something that needs to happen before you can boot hasn't yet happened when GRUB tries to walk through the process - and has happened by the time a human (you) gets to it. Seems strange, but I'm not coming up with anything else that makes sense.
Unfortunately, my solution - since whatever it is happens deep in the guts of GRUB - would be to do a backup and tell GRUB to start from scratch.
> Also, in the boot CD the file I thought I copied as > /BOOT/GRUB/GRUB.CONF is saved as /BOOT/GRUB/GRUB.CON ('F' is missing). > > I suspect that it has to do with the DOS 8+3 file naming scheme. Yep - > we're still stuck with that, given that the whole mess with booting CDs > still has those roots... In any case, there's usually some sort of a map > file that keeps track of the actual file names. > > You are right - I looked at the CD when in Windows.
Given that a number of the CD standards (e.g., Yellow Book) support the "DOS + mapfile" filename scheme, it doesn't really matter what OS you used to look at it - anything that allows you to look at the actual FS on the CD will show you those 8+3 names.
-- * Ben Okopnik * Editor-in-Chief, Linux Gazette * https://LinuxGazette.NET *