[WBEL-users] Investigating a kernel panic

Kirby C. Bohling kbohling at birddog.com
Tue Apr 12 16:49:42 CDT 2005


On Tue, Apr 12, 2005 at 11:38:12AM +0200, Simone wrote:
> Hi list,
> I periodically experience kernel panics on my wbel3 box. This server is 
> running as a front end mail filter for exchange, with a typical 
> MailScanner - sendmail - clamav setup. I also have mailwatch for 
> mailscanner running, which is a LAMP kind setup. Every 3-4 weeks I have 
> a crash, with keyboard blinking lights, and I don't understand the 
> reason for it. Recently installed a new server same configuration, 2 x 
> 18Gb scsi disks raid1, thinking it was possibly a hardware problem, but 
> this morning after a month running fine, I had the first crash. Could 
> you please tell me where to look for possible indications on what could 
> have caused the panic? Checked log/messages but it looks like no useful 
> info is in there.
> 
> Thanks for your suggestions
> Have a fine day

Simone,

	Well, obviously this isn't terribly proactive, but one thing you
might try is setting up netdump.  I've never set it up, but I'm
intending to sooner rather then later.  Essentially, we have a
number of machines that kernel panic, while screen blanking is on.
So you don't get any information.

	In theory netdump will dump core over a network so that you
capture the state of the machine at the time of the crash and can do
the debugging.  Okay, I'm guessing you can't otherwise you wouldn't
need this suggestion (I can't either).  However, you might be able
to use the symbols and backtrace information from the oops to track
down what the cause is via googling.  With an oops, you can use that
to as a starting point.

	In order to do this, you'll need another machine running linux,
with enough free disk space as the machines you are dumping from
have RAM + SWAP if I remember correctly.

	If I were you, I'd put memtest86 in the machine and let it run
for a couple of days.  It sure sounds like memory corruption or
overheating.  I've had machines with memory corruption that ran for
weeks or months at a time.  It wasn't until something critical got
stored where the bad bits where that it crashed.

	Kirby



More information about the Whitebox-users mailing list