[WBEL-users] problem with grub and recent kernel

Ed ekg@tricity.wsu.edu
Mon, 09 Aug 2004 13:50:14 -0700


Jeff Forbes wrote:
> See below
> 
> At 06:48 PM 8/6/2004, you wrote:
> 
>> Jeff Forbes wrote:
>>
>>> Hi All,
>>> I am having a very odd problem.
>>> I just rebooted an AMD machine which just had the
>>> kernel automatically upgraded to 2.4.21-15.0.3-EL
>>> The reboot failed with the consol displaying the Grub prompt.
>>> Using the cat command to display the config files yield a screen
>>> full of question marks.
>>
>>
>> This is the builtin grub cat, right?
>>
>> Try catting other files.  See if they are full of question marks, too.

Have you tried this?  What happened?

>>
>>> I was able to get it to boot an earlier version of the kernel by 
>>> editing the
>>> the config file after booting with a rescue disk. When I rebooted to the
>>
>>
>> menu.lst?  Maybe your disk has problems, you could try running 
>> badblocks to see.

Have you tried this?  What happened?

>>
>>> kernel menu
>>> and selected, an error indicating an incorrect executable file format 
>>> was printed.
>>
>>
>> This is for the new kernel file?  Try running file on it to see what 
>> it thinks it is.  If the kernel is corrupted, it's probably your 
>> hardware, since rpm packages have a checksum.  Try running memtest for 
>> a day, or make your heatsink bigger, or un-overclock your system if 
>> it's overclocked.  Maybe your memory is running on too high a bus 
>> speed, it could be anything.

Have you tested your hardware since you assembled the computer? 
Sometimes things come loose.

>>
>>> If remember correctly, error 13 was printed. I was still able to 
>>> reboot to the older version.
>>> I then tried the lasted kernel 2.4.21-15.0.4-EL with the same results.
>>> I also determined that whenever the grub config file was edited under 
>>> the kernels on the harddrive,
>>> grub was unable to read it.
>>
>>
>> I can't parse this.  When are you editing the file?  When can't you 
>> read it?  How, exactly, do you reproduce this?
> 
> 
> 
> 1. On running linux system edit crub.conf. Reboot. Grub unable to read 
> config file for some reason so it goes to the prompt.

See if you can do a tab-complete and check that the hd(0,1,etc) numbers 
are correct.  (hd0,0) in grub is /dev/hda1 in linux!!!!!

> 2. Using cat to display the config files shows many "?"

Did you use cat from linux, or at the grub prompt?  Try booting linux 
manually.

> 3. Boot rescue floppy. Grub config file looks fine. Open and save. 
> Reboot. Grub can now read file and boot the older kernels.
> goto 1

Try this:  copy the DISTRIBUTION kernel from the rescue disk to your 
hard drive, and boot up under the distribution kernel.  Once in the 
distribution kernel, modify grub.conf.  Now, run grub from the command 
prompt and see if it will read the file (consider re-installing grub 
using root (hd0,whatever the number of your /boot parition is minus one)
setup (hd0)).  Then try rebooting.  Do you still have the same problem?
Are there any messages in /var/log/messages about seek errors or something?

> 
> I assemble my own systems and thoroughly burn them in.
> This machine was assembled several months ago and burned in with no 
> problems.

It sounds like you know what you are doing.  But since you are asking 
for advice:

> Software installed and ran OK.
> Only indication of any problem was the above reboot problem this week.

This can't be the only problem if I understand the situation correctly. 
  Something is wrong with your kernel, your computer, your copy of grub, 
or your filesystem.  Until you can get something else to fail, there's 
no way to know.

As you said, its an unusual problem.  It may be more cost-effective to 
get rid of your computer entirely and replace it with another one. 
Otherwise, you have to test everything, because anything or combination 
of things could be the problem.  Consider reformatting and reinstalling. 
  You probably won't find any new information by running the same tests 
over and over.

Corruption problems are annoying and difficult to deal with, and I 
understand your frustration.  Sometimes it's better to swallow the ego 
and buy prebuilt systems from a good linux hardware company, because 
then all the time-consuming boring problems are theirs instead of ours.

Also, if you're only replying to me instead of the list because you 
think that noone else has had this problem you'd be surprised.  I'm no 
expert, and there's lots of people on there that have decades of 
experience with this stuff and may have useful suggestions that I won't 
think of.  Also, what if I get confused and give you bad advice?  If 
it's on the list someone can correct me before something bad happens.

Feeling your pain,

   Ed

> 
> 
> 
> 
>>> I would then have to read the file under the rescue disk and then 
>>> grub could
>>> read it.
>>
>>
>> This is the same exact copy of grub, or a different one?  Is grub 
>> running standalone, or under a linux shell?
>>
>>> I am stumped. Does anyone have any suggestions?
>>
>>
>> Make backups of any important data on your computer, and then make 
>> sure you can read the backups on a different computer.  Test your 
>> machine thoroughly for hardware problems.  Run fsck on your 
>> filesystems *after you make backups*, even if they're marked clean.  
>> If it's corrupt, try a differnt hard drive, but keep in mind that it 
>> could be a memory problem.
>> Consider giving your computer to a Windows user who will think that 
>> massive data corruption is no big deal, and get a new, working, 
>> machine from laclinux.com (or any competent linux company -- check 
>> their customer reviews).
>>
>> Good Luck!!
>>
>>
>>>
>>>
>>> Jeff Forbes
>>> _______________________________________________
>>> Whitebox-users mailing list
>>> Whitebox-users@beau.org
>>> http://beau.org/mailman/listinfo/whitebox-users
>>
>>
>>
>> Jeff Forbes 
> 
>