<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML>
<HEAD>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=Windows-1252">
<META NAME="Generator" CONTENT="MS Exchange Server version 6.0.6603.0">
<TITLE>RE: Server hard disk failure (Ed Lauzier)</TITLE>
</HEAD>
<BODY>
<!-- Converted from text/plain format -->
<P><FONT SIZE=2>Well, I thought I'd let a few in on a horror story I had awhile ago<BR>
with WBEL Release 3 Respin 1. I was getting ext3 file system corruptions<BR>
on a daily basis and could not pin down what was causing it. It got<BR>
so bad that once I had to pull an entire directory structure one by one<BR>
out of lost+found. At least I had the data and there was no "real"<BR>
data loss.<BR>
<BR>
I then started to look into why this could be happening. I was getting<BR>
kernel panics on one box and file system corruption on another. A user<BR>
pointed out to me that it may be a RedHat kernel bug that may or may not<BR>
have been fixed. I was testing some of our software and took advantage<BR>
of the servers going down to test our failover scenerios. After I was<BR>
finished testing, I turned off our software.<BR>
<BR>
When I turned off our software, the problems stopped. This told me that<BR>
the crux of the problem could be with the kernel drivers for NFS and ext3<BR>
filesystems and how they interact.<BR>
The machine with the corrupted filesystem(s) was an NFS server. <BR>
The box with the kernel panics was an NFS client running WBEL 3R1.<BR>
After I turned off our software, which runs fine by the way on<BR>
all other platforms, the problems stopped. Strange. Our software does not<BR>
cause these problems when the shared area is on a NetApp, EMC box, or<BR>
Solaris box.<BR>
<BR>
Conclusions for WBEL 3R1:<BR>
NFS and ext3 kernel drivers may cause some problems on some hardware types.<BR>
On the problem platforms, I'm using the ASUS A7V motherboard. I also have<BR>
IBM BladeCenter servers running in a similar configuration and have not<BR>
had problems ( yet ). No problems either with a Sun box running Solaris8<BR>
and sharing out an area for NFS. The user who informed me that there was<BR>
a possible kernel bug causing the problems also suggested using an ext2<BR>
filesystem, which I have not gone to.( I forget the thread...)<BR>
I'd rather see the problem identified and fixed, and move forward...<BR>
<BR>
Hope this helps...<BR>
<BR>
Ed<BR>
<BR>
<BR>
-----Original Message-----<BR>
From: whitebox-users-bounces@beau.org on behalf of whitebox-users-request@beau.org<BR>
Sent: Thu 3/31/2005 12:04 AM<BR>
To: whitebox-users@beau.org<BR>
Cc: <BR>
Subject: Whitebox-users Digest, Vol 3, Issue 43<BR>
Send Whitebox-users mailing list submissions to<BR>
whitebox-users@beau.org<BR>
<BR>
To subscribe or unsubscribe via the World Wide Web, visit<BR>
<A HREF="http://beau.org/mailman/listinfo/whitebox-users">http://beau.org/mailman/listinfo/whitebox-users</A><BR>
or, via email, send a message with subject or body 'help' to<BR>
whitebox-users-request@beau.org<BR>
<BR>
You can reach the person managing the list at<BR>
whitebox-users-owner@beau.org<BR>
>Message: 1<BR>
>Date: Wed, 30 Mar 2005 22:40:55 +0100<BR>
>From: Francies Moore <liz@indract.freeserve.co.uk><BR>
>Subject: [WBEL-users] Server hard disk failure<BR>
>To: whitebox-users@beau.org<BR>
>Message-ID: <424B1CE7.4020209@indract.freeserve.co.uk><BR>
>Content-Type: text/plain; charset=ISO-8859-1; format=flowed<BR>
<BR>
>Hi everyone<BR>
<BR>
>One of my WBEL servers has crashed due to the failure of one of its hard<BR>
>disks. My hardware support technician (a recent Linux convert) says it<BR>
>could be a filesystem failure. Is such a thing possible on a machine<BR>
>which was doing nothing over Easter? I thought Linux was more stable<BR>
>than "that other system" in this regard.<BR>
<BR>
>Whatever happened to it, it cannot reboot as the ext3 journal cannot get<BR>
>its head around the situation.<BR>
<BR>
>How do I recover what is left on the surviving hard disk (which contains<BR>
>the operating system and some user files)? Do I revert to ext2 by<BR>
>deleting the journal and changing the fstab? If so, where do I find the<BR>
>journal file, and what is it called?<BR>
<BR>
>Can I go back to ext3 when a new HD is fitted?<BR>
<BR>
>Thanks.<BR>
<BR>
>Francies<BR>
<BR>
<BR>
<BR>
<BR>
<BR>
</FONT>
</P>
</BODY>
</HTML>