[WBEL-users] Another filesystem failure

khaqq khaqq at free.fr
Fri Jul 8 10:39:39 CDT 2005


On Wed, 06 Jul 2005 08:10:04 +0100
Francies Moore <liz at indract.freeserve.co.uk> wrote:

> But ... yesterday my main White Box/Lotus Domino server failed at 5.02 
> am.  It took an engineer all day to retrieve Domino and get it back 
> running.  It has two Maxstor SATA drives and he has managed to get the 
> whole system back up on the drive that still works..  Unfortunately I do 
> not have copies of the error messages at home here, but basically it was 
> coming up with Sense Key errors, Drive_Seek errors and ext3 fs errors.  
> Maxstor diagnostics when run said the disk (the disk with all the 
> data!!!) was bad.  However, later on Maxstor helpline said that it was 
> an OS/fs error and the disk needed to be low level formatted and then 
> would be OK.

Drive_Seek errors ? Your disk seems dead.
Backup your data (I suppose it's already done anyway) and :
Try running smartctl -a /dev/hde (or wherever your disk is) and look at the
log. Alternatively, try to run smartctl -t long /dev/hde (same) and then
re-run -a . This checks the drive for defects (and can be quite long to run,
it will tell you how long). It is non-destructive, but who knows when the
drive wants to die.

You can also use badblocks (badblocks -c 8192 -p 5 -w -s <device>). This
is data-destructive.

If badblocks sees bad sectors, you have to send it back to Maxtor. Same if 
smartctl reports errors during the surface test.
I'm sick of vendors saying "what OS are you running ?" "Linux" "it's a OS
problem", BTW.

> For myself I never trust a disk again but always replace it.  However, I 
> do not have the final say in this instance.

If a drive passes 5 badblocks tests, and 2 (one before, one after) 
smartctl -t long tests it's good as new, as far as I'm concerned.

> I am seriously concerned as I thought Linux was stable - the issues I 
> have had with my home machines have been hardware (dead HD, bad RAM) 
> except one time when a dual boot Win98/WBEL system found ext2 fs errors 
> on startup, fixed by fsck, but we are beginning to wonder if we should 
> go back to Windows on our production servers. I don't want to!!

Hmm. Linux is certainly at least as stable as Windows, fs-wise. At least I
haven't had any problems with ext3, JFS, or ReiserFS lately.

> Any suggestion as to what is happening to my servers would be appreciated.

I'd change the disk. If you value reliability, some vendors are better than
Maxtor, IMHO. Maybe your PSU is acting up, too.

> PS I have been running RedHat 8 for over 2 years with Samba and Apache 
> on a box built out of bits and it just runs and runs....

I have similar stories. However, I haven't been able to reproduce that with
2000 Server. *HINT*.

khaqq


More information about the Whitebox-users mailing list