[WBEL-users] Disks failures
Simone
simone72@email.it
Mon, 23 Aug 2004 12:59:18 +0200
This is a multi-part message in MIME format.
--------------010103010704090303080908
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 8bit
Thanks for your tips. Thanks also to Steve Kuervers for his suggestions.
You're right, my power supply is not a "good" one, just a cheap one, in
a cheap case. But the server is an "enterprise" one (100 users). Know
is not wise to have such a cheap hardware , but it's what I got now. I
am running a script that runs smartctl every 20 minutes and emails me if
the disks don't pass the test, and running also another one that checks
temp (hdtemp), but the server is in the server room (at least this....)
and the disk temp is never over 40°C.
As suggested, I run the powermax utility and it found an error, that was
corrected, then reformatted low level the disk is now passing all the
tests. So I repartitioned and readded to the raid on thursday, and till
now all looks working and happy. I think it could be the power supply,
but now the server is on a UPS (we often have black outs, and I believe
the service we get is very unstable), so hopefully I won't see any of
this again.....optimistic?
Thanks for your help, have a nice day
Simone
John Morris wrote:
>On Tue, 17 Aug 2004, Simone wrote:
>
>
>
>>I'm using wbl, latest kernel and latest samba package (but the first
>>failure occurred on WBL while I was with an earlier kernel and samba
>>package), two disks in mirroring, each a maxtor 6Y200P0 200Gb ide 133, 4
>>primary partitions (/boot, /, /samba, swap), each disk on a different
>>IDE controller, ECS motherboard K7VTA3 (know it's not the best.....).
>>I'm now reading output from badblocks that confirms the hd is broken
>>(I/O error), so I'm just wondering if I should be very very worried or
>>just average worried. One more info, the hard disks have been bought in
>>two different shops, but both disks broke on the secondary IDE controller.
>>
>>
>
>If you are using an ECS mobo I'm going to guess that it isn't in an
>enterprise grade server case. So I'd suspect temp or power. Get smartd
>configured so you can monitor the drive temp and see if the drives
>connected to the secondary controller are running hotter than the primary
>drives.
>
>Here are a couple of interesting lines from my buildhost's output when I
>do smartctl -a /dev/hda
>
>=== START OF INFORMATION SECTION ===
>Device Model: Maxtor 6Y200P0
>Serial Number: Y61TKF0E
>Firmware Version: YAR41BW0
> .....
>194 Temperature_Celsius 0x0032 253 253 000 Old_age Always - 34
> .....
>SMART Error Log Version: 1
>No Errors Logged
>
>If it isn't the temp, make sure you have a good power supply. (Hint: if
>it came with the case it probably isn't 'good'.)
>
>
>
--
Email.it, the professional e-mail, gratis per te: http://www.email.it/f
Sponsor:
Amanti del digitale è arrivato il prodotto che aspettavate! Per ricevere i fax sul PC!
* Scoprite come cliccando qui
Clicca qui: http://adv.email.it/cgi-bin/foclick.cgi?mid=1567&d=23-8
--------------010103010704090303080908
Content-Type: text/html; charset=us-ascii
Content-Transfer-Encoding: 7bit
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta content="text/html;charset=ISO-8859-1" http-equiv="Content-Type">
<title></title>
</head>
<body bgcolor="#ffffff" text="#000000">
Thanks for your tips. Thanks also to Steve Kuervers for his
suggestions. You're right, my power supply is not a "good" one, just a
cheap one, in a cheap case. But the server is an "enterprise" one (100
users). Know is not wise to have such a cheap hardware , but it's
what I got now. I am running a script that runs smartctl every 20
minutes and emails me if the disks don't pass the test, and running
also another one that checks temp (hdtemp), but the server is in the
server room (at least this....) and the disk temp is never over 40°C. <br>
As suggested, I run the powermax utility and it found an error, that
was corrected, then reformatted low level the disk is now passing all
the tests. So I repartitioned and readded to the raid on thursday, and
till now all looks working and happy. I think it could be the power
supply, but now the server is on a UPS (we often have black outs, and I
believe the service we get is very unstable), so hopefully I won't see
any of this again.....optimistic?<br>
<br>
Thanks for your help, have a nice day<br>
<br>
Simone<br>
<br>
<br>
John Morris wrote:<br>
<blockquote cite="midPine.LNX.4.44.0408181556080.4581-100000@mjolnir"
type="cite">
<pre wrap="">On Tue, 17 Aug 2004, Simone wrote:
</pre>
<blockquote type="cite">
<pre wrap="">I'm using wbl, latest kernel and latest samba package (but the first
failure occurred on WBL while I was with an earlier kernel and samba
package), two disks in mirroring, each a maxtor 6Y200P0 200Gb ide 133, 4
primary partitions (/boot, /, /samba, swap), each disk on a different
IDE controller, ECS motherboard K7VTA3 (know it's not the best.....).
I'm now reading output from badblocks that confirms the hd is broken
(I/O error), so I'm just wondering if I should be very very worried or
just average worried. One more info, the hard disks have been bought in
two different shops, but both disks broke on the secondary IDE controller.
</pre>
</blockquote>
<pre wrap=""><!---->
If you are using an ECS mobo I'm going to guess that it isn't in an
enterprise grade server case. So I'd suspect temp or power. Get smartd
configured so you can monitor the drive temp and see if the drives
connected to the secondary controller are running hotter than the primary
drives.
Here are a couple of interesting lines from my buildhost's output when I
do smartctl -a /dev/hda
=== START OF INFORMATION SECTION ===
Device Model: Maxtor 6Y200P0
Serial Number: Y61TKF0E
Firmware Version: YAR41BW0
.....
194 Temperature_Celsius 0x0032 253 253 000 Old_age Always - 34
.....
SMART Error Log Version: 1
No Errors Logged
If it isn't the temp, make sure you have a good power supply. (Hint: if
it came with the case it probably isn't 'good'.)
</pre>
</blockquote>
<br>
<BR>
<br>
<br><p><font face=3D"Verdana,Arial" size=3D2>----<br>
Email.it, the professional e-mail, gratis per te:<a href=3Dhttp://www.email.it/cgi-bin/start?sid=3D3 target=3D_blank >clicca=
qui</a><br>
<br>
Sponsor:<br>
Solo su Occhialeria.it una vastissima scelta a prezzi insuperabili!<br>Per =
te le migliori marche e un incredibile assortimento.<br>
<a href=3Dhttp://adv.email.it/cgi-bin/foclick.cgi?mid=3D880&d=3D23-8 targe=
t=3D_blank >Clicca qui</a></font><br>
<BR></body>
</html>
--------------010103010704090303080908--