[WBEL-users] Disks failures

Simone simone72@email.it
Mon, 23 Aug 2004 12:59:18 +0200


This is a multi-part message in MIME format.
--------------010103010704090303080908
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 8bit

Thanks for your tips. Thanks also to Steve Kuervers for his suggestions. 
You're right, my power supply is not a "good" one, just a cheap one, in 
a cheap case. But the server is an "enterprise" one (100 users).  Know 
is not wise to have such a cheap  hardware , but it's what I got now. I 
am running a script that runs smartctl every 20 minutes and emails me if 
the disks don't pass the test, and running also another one that checks 
temp (hdtemp), but the server is in the server room (at least this....) 
and the disk temp is never over 40°C.
As suggested, I run the powermax utility and it found an error, that was 
corrected, then reformatted low level the disk is now passing all the 
tests. So I repartitioned and readded to the raid on thursday, and till 
now all looks working and happy. I think it could be the power supply, 
but now the server is on a UPS (we often have black outs, and I believe 
the service we get is very unstable), so hopefully I won't see any of 
this again.....optimistic?

Thanks for your help, have a nice day

Simone
 

John Morris wrote:

>On Tue, 17 Aug 2004, Simone wrote:
>
>  
>
>>I'm using wbl, latest kernel and latest samba package (but the first 
>>failure occurred on WBL while I was with an earlier kernel and samba 
>>package), two disks in mirroring, each a maxtor 6Y200P0 200Gb ide 133, 4 
>>primary partitions (/boot, /, /samba, swap), each disk on a different 
>>IDE controller, ECS motherboard K7VTA3 (know it's not the best.....). 
>>I'm now reading output from badblocks that confirms the hd is broken 
>>(I/O error), so I'm just wondering if I should be very very worried or 
>>just average worried. One more info, the hard disks have been bought in 
>>two different shops, but both disks broke on the secondary IDE controller.
>>    
>>
>
>If you are using an ECS mobo I'm going to guess that it isn't in an
>enterprise grade server case.  So I'd suspect temp or power.  Get smartd
>configured so you can monitor the drive temp and see if the drives
>connected to the secondary controller are running hotter than the primary
>drives.
>
>Here are a couple of interesting lines from my buildhost's output when I
>do smartctl -a /dev/hda
>
>=== START OF INFORMATION SECTION ===
>Device Model:     Maxtor 6Y200P0
>Serial Number:    Y61TKF0E
>Firmware Version: YAR41BW0
>  .....
>194 Temperature_Celsius     0x0032   253   253   000    Old_age Always  -   34
>  .....
>SMART Error Log Version: 1
>No Errors Logged
>
>If it isn't the temp, make sure you have a good power supply.  (Hint: if 
>it came with the case it probably isn't 'good'.)
>
>  
>


 
 
 --
 Email.it, the professional e-mail, gratis per te: http://www.email.it/f
 
 Sponsor:
 Amanti del digitale è arrivato il prodotto che aspettavate! Per ricevere i fax sul PC!
* Scoprite come cliccando qui
 Clicca qui: http://adv.email.it/cgi-bin/foclick.cgi?mid=1567&d=23-8
--------------010103010704090303080908
Content-Type: text/html; charset=us-ascii
Content-Transfer-Encoding: 7bit

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
  <meta content="text/html;charset=ISO-8859-1" http-equiv="Content-Type">
  <title></title>
</head>
<body bgcolor="#ffffff" text="#000000">
Thanks for your tips. Thanks also to Steve Kuervers for his
suggestions. You're right, my power supply is not a "good" one, just a
cheap one, in a cheap case. But the server is an "enterprise" one (100
users).&nbsp; Know is not wise to have such a cheap&nbsp; hardware , but it's
what I got now. I am running a script that runs smartctl every 20
minutes and emails me if the disks don't pass the test, and running
also another one that checks temp (hdtemp), but the server is in the
server room (at least this....) and the disk temp is never over 40&deg;C. <br>
As suggested, I run the powermax utility and it found an error, that
was corrected, then reformatted low level the disk is now passing all
the tests. So I repartitioned and readded to the raid on thursday, and
till now all looks working and happy. I think it could be the power
supply, but now the server is on a UPS (we often have black outs, and I
believe the service we get is very unstable), so hopefully I won't see
any of this again.....optimistic?<br>
<br>
Thanks for your help, have a nice day<br>
<br>
Simone<br>
&nbsp;<br>
<br>
John Morris wrote:<br>
<blockquote cite="midPine.LNX.4.44.0408181556080.4581-100000@mjolnir"
 type="cite">
  <pre wrap="">On Tue, 17 Aug 2004, Simone wrote:

  </pre>
  <blockquote type="cite">
    <pre wrap="">I'm using wbl, latest kernel and latest samba package (but the first 
failure occurred on WBL while I was with an earlier kernel and samba 
package), two disks in mirroring, each a maxtor 6Y200P0 200Gb ide 133, 4 
primary partitions (/boot, /, /samba, swap), each disk on a different 
IDE controller, ECS motherboard K7VTA3 (know it's not the best.....). 
I'm now reading output from badblocks that confirms the hd is broken 
(I/O error), so I'm just wondering if I should be very very worried or 
just average worried. One more info, the hard disks have been bought in 
two different shops, but both disks broke on the secondary IDE controller.
    </pre>
  </blockquote>
  <pre wrap=""><!---->
If you are using an ECS mobo I'm going to guess that it isn't in an
enterprise grade server case.  So I'd suspect temp or power.  Get smartd
configured so you can monitor the drive temp and see if the drives
connected to the secondary controller are running hotter than the primary
drives.

Here are a couple of interesting lines from my buildhost's output when I
do smartctl -a /dev/hda

=== START OF INFORMATION SECTION ===
Device Model:     Maxtor 6Y200P0
Serial Number:    Y61TKF0E
Firmware Version: YAR41BW0
  .....
194 Temperature_Celsius     0x0032   253   253   000    Old_age Always  -   34
  .....
SMART Error Log Version: 1
No Errors Logged

If it isn't the temp, make sure you have a good power supply.  (Hint: if 
it came with the case it probably isn't 'good'.)

  </pre>
</blockquote>
<br>
<BR>
<br>
<br><p><font face=3D"Verdana,Arial" size=3D2>----<br>
Email.it, the professional e-mail, gratis per te:<a href=3Dhttp://www.email.it/cgi-bin/start?sid=3D3 target=3D_blank >clicca=
 qui</a><br>
<br>
Sponsor:<br>
Solo su Occhialeria.it una vastissima scelta a prezzi insuperabili!<br>Per =
te le migliori marche e un incredibile assortimento.<br>
<a href=3Dhttp://adv.email.it/cgi-bin/foclick.cgi?mid=3D880&d=3D23-8  targe=
t=3D_blank >Clicca qui</a></font><br>

<BR></body>
</html>

--------------010103010704090303080908--