[WBEL-users] Investigating a kernel panic
Simone
simone72 at email.it
Wed Apr 13 04:13:10 CDT 2005
Interesting, I will set up netdump then, looks like a very good way to
learn something more about my system and Linux internals in general :) .
We actually have 2 servers running mail filtering, so I can dump one
over the other and vice versa. This morning I moved one HD in a
different position to get a better dissipation and it looks like I
achieved a - 10 °C result which is not bad. Now the warm one is 50°C vs
the >59°C it was before. I will also run memtest (I checked the memory
before the install) for sometime and see if I get more info......
actually the memory is the only thing I took from the first
installation, so it would make sense......
I'll let you know, thanks, have a nice day
Simone
Kirby C. Bohling wrote:
>On Tue, Apr 12, 2005 at 11:38:12AM +0200, Simone wrote:
>
>
>>Hi list,
>>I periodically experience kernel panics on my wbel3 box. This server is
>>running as a front end mail filter for exchange, with a typical
>>MailScanner - sendmail - clamav setup. I also have mailwatch for
>>mailscanner running, which is a LAMP kind setup. Every 3-4 weeks I have
>>a crash, with keyboard blinking lights, and I don't understand the
>>reason for it. Recently installed a new server same configuration, 2 x
>>18Gb scsi disks raid1, thinking it was possibly a hardware problem, but
>>this morning after a month running fine, I had the first crash. Could
>>you please tell me where to look for possible indications on what could
>>have caused the panic? Checked log/messages but it looks like no useful
>>info is in there.
>>
>>Thanks for your suggestions
>>Have a fine day
>>
>>
>
>Simone,
>
> Well, obviously this isn't terribly proactive, but one thing you
>might try is setting up netdump. I've never set it up, but I'm
>intending to sooner rather then later. Essentially, we have a
>number of machines that kernel panic, while screen blanking is on.
>So you don't get any information.
>
> In theory netdump will dump core over a network so that you
>capture the state of the machine at the time of the crash and can do
>the debugging. Okay, I'm guessing you can't otherwise you wouldn't
>need this suggestion (I can't either). However, you might be able
>to use the symbols and backtrace information from the oops to track
>down what the cause is via googling. With an oops, you can use that
>to as a starting point.
>
> In order to do this, you'll need another machine running linux,
>with enough free disk space as the machines you are dumping from
>have RAM + SWAP if I remember correctly.
>
> If I were you, I'd put memtest86 in the machine and let it run
>for a couple of days. It sure sounds like memory corruption or
>overheating. I've had machines with memory corruption that ran for
>weeks or months at a time. It wasn't until something critical got
>stored where the bad bits where that it crashed.
>
> Kirby
>
>_______________________________________________
>Whitebox-users mailing list
>Whitebox-users at beau.org
>http://beau.org/mailman/listinfo/whitebox-users
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://beau.org/pipermail/whitebox-users/attachments/20050413/5ac61ed3/attachment.htm
More information about the Whitebox-users
mailing list