[WBEL-users] Investigating a kernel panic

Wed Apr 13 04:13:10 CDT 2005

Interesting, I will set up netdump then, looks like a very good way to 
learn something more about my system and Linux internals in general :) . 
We actually have 2 servers running mail filtering, so I can dump one 
over the other and vice versa. This morning I moved one HD in a 
different position to get a better dissipation and it looks like I 
achieved a - 10 °C result which is not bad. Now the warm one is 50°C vs 
the >59°C it was before. I will also run memtest (I checked the memory 
before the install) for sometime and see if I get more info...... 
actually the memory is the only thing I took from the first 
installation, so it would make sense......
I'll let you know, thanks, have a nice day

Simone

Kirby C. Bohling wrote:

>On Tue, Apr 12, 2005 at 11:38:12AM +0200, Simone wrote:
>  
>
>>Hi list,
>>I periodically experience kernel panics on my wbel3 box. This server is 
>>running as a front end mail filter for exchange, with a typical 
>>MailScanner - sendmail - clamav setup. I also have mailwatch for 
>>mailscanner running, which is a LAMP kind setup. Every 3-4 weeks I have 
>>a crash, with keyboard blinking lights, and I don't understand the 
>>reason for it. Recently installed a new server same configuration, 2 x 
>>18Gb scsi disks raid1, thinking it was possibly a hardware problem, but 
>>this morning after a month running fine, I had the first crash. Could 
>>you please tell me where to look for possible indications on what could 
>>have caused the panic? Checked log/messages but it looks like no useful 
>>info is in there.
>>
>>Thanks for your suggestions
>>Have a fine day
>>    
>>
>
>Simone,
>
>	Well, obviously this isn't terribly proactive, but one thing you
>might try is setting up netdump.  I've never set it up, but I'm
>intending to sooner rather then later.  Essentially, we have a
>number of machines that kernel panic, while screen blanking is on.
>So you don't get any information.
>
>	In theory netdump will dump core over a network so that you
>capture the state of the machine at the time of the crash and can do
>the debugging.  Okay, I'm guessing you can't otherwise you wouldn't
>need this suggestion (I can't either).  However, you might be able
>to use the symbols and backtrace information from the oops to track
>down what the cause is via googling.  With an oops, you can use that
>to as a starting point.
>
>	In order to do this, you'll need another machine running linux,
>with enough free disk space as the machines you are dumping from
>have RAM + SWAP if I remember correctly.
>
>	If I were you, I'd put memtest86 in the machine and let it run
>for a couple of days.  It sure sounds like memory corruption or
>overheating.  I've had machines with memory corruption that ran for
>weeks or months at a time.  It wasn't until something critical got
>stored where the bad bits where that it crashed.
>
>	Kirby
>
>_______________________________________________
>Whitebox-users mailing list
>Whitebox-users at beau.org
>http://beau.org/mailman/listinfo/whitebox-users
>
>  
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://beau.org/pipermail/whitebox-users/attachments/20050413/5ac61ed3/attachment.htm