[TriLUG] smartmontools - selftest fails, Health status passed. ???

lfwelty at nc.rr.com lfwelty at nc.rr.com
Thu Jan 8 14:21:31 EST 2004


I may be able to answer my own question - a google search seems
to indicate people are recommending hdd replacement in this
situation.
http://216.239.41.104/search?q=cache:OpA21QVKE7wJ:lists.debian.org/debian-powerpc/2003/debian-powerpc-200310/msg00413.html+smartctl+PASSED+extended+read+failure&hl=en&ie=UTF-8

Can anyone confirm or provide more information?

Thanks again,

F.

lfwelty at nc.rr.com wrote:
> Hi y'all,
> 
> I have a hdd that is showing some seemingly (to me at least)
> conflicting information. smartctl's health status shows the
> hdd as PASSED, but it's failing the short and long selftests
> at the same place.
> 
> - relevent smartctl output below.
> 
> If the health status were FAILED and I was seeing the errors
> I would definately replace the hdd. But since they're conflicting,
> I'm not sure if I need to replace it.
> 
> Thanks for the help,
> 
> - Frank.
> 
> tiresias|ROOT:lfwelty-2# smartctl -a /dev/hda
> smartctl version 5.1-18 Copyright (C) 2002-3 Bruce Allen
> Home page is http://smartmontools.sourceforge.net/
> 
> === START OF INFORMATION SECTION ===
> Device Model:     MAXTOR 6L080J4
> Serial Number:    664204750210
> Firmware Version: A93.0500
> Device is:        In smartctl database [for details use: -P show]
> ATA Version is:   5
> ATA Standard is:  ATA/ATAPI-5 T13 1321D revision 1
> Local Time is:    Thu Jan  8 13:54:32 2004 EST
> SMART support is: Available - device has SMART capability.
> SMART support is: Enabled
> 
> === START OF READ SMART DATA SECTION ===
> SMART overall-health self-assessment test result: PASSED
> 
> General SMART Values:
> Off-line data collection status: (0x00) Offline data collection activity 
> was
>                                         never started.
>                                         Auto Off-line Data Collection: 
> Disabled.
> Self-test execution status:      ( 112) The previous self-test completed 
> having
>                                         the read element of the test 
> failed.
> Total time to complete off-line
> data collection:                 (  35) seconds.
> Offline data collection
> capabilities:                    (0x1b) SMART execute Offline immediate.
>                                         Automatic timer ON/OFF support.
>                                         Suspend Offline collection upon new
>                                         command.
>                                         Offline surface scan supported.
>                                         Self-test supported.
>                                         No Conveyance Self-test supported.
>                                         No Selective Self-test supported.
> SMART capabilities:            (0x0003) Saves SMART data before entering
>                                         power-saving mode.
>                                         Supports SMART auto save timer.
> Error logging capability:        (0x01) Error logging supported.
>                                         No General Purpose Logging support.
> Short self-test routine
> recommended polling time:        (   2) minutes.
> Extended self-test routine
> recommended polling time:        (  40) minutes.
> 
> SMART Attributes Data Structure revision number: 11
> Vendor Specific SMART Attributes with Thresholds:
> ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      
> UPDATED  WHEN_FAILED RAW_VALUE
>   1 Raw_Read_Error_Rate     0x0029   100   253   020    Pre-fail  
> Offline      -       0
>   3 Spin_Up_Time            0x0027   068   065   020    Pre-fail  
> Always       -       4092
>   4 Start_Stop_Count        0x0032   100   100   008    Old_age   
> Always       -       162
>   5 Reallocated_Sector_Ct   0x0033   099   099   020    Pre-fail  
> Always       -       5
>   7 Seek_Error_Rate         0x000b   100   100   023    Pre-fail  
> Always       -       0
>   9 Power_On_Hours          0x0012   079   079   001    Old_age   
> Always       -       14083
>  10 Spin_Retry_Count        0x0026   100   100   000    Old_age   
> Always       -       0
>  11 Calibration_Retry_Count 0x0013   100   100   020    Pre-fail  
> Always       -       0
>  12 Power_Cycle_Count       0x0032   100   100   008    Old_age   
> Always       -       62
>  13 Read_Soft_Error_Rate    0x000b   100   093   023    Pre-fail  
> Always       -       0
> 194 Temperature_Celsius     0x0022   086   082   042    Old_age   
> Always       -       37
> 195 Hardware_ECC_Recovered  0x001a   100   001   000    Old_age   
> Always       - 99292106
> 196 Reallocated_Event_Count 0x0010   100   100   020    Old_age   
> Offline      -       0
> 197 Current_Pending_Sector  0x0032   100   100   020    Old_age   
> Always       -       3
> 198 Offline_Uncorrectable   0x0010   100   253   000    Old_age   
> Offline      -       0
> 199 UDMA_CRC_Error_Count    0x001a   200   200   000    Old_age   
> Always       -       0
> 
> SMART Error Log Version: 1
> ATA Error Count: 39 (device log contains only the most recent five errors)
>         CR = Command Register [HEX]
>         FR = Features Register [HEX]
>         SC = Sector Count Register [HEX]
>         SN = Sector Number Register [HEX]
>         CL = Cylinder Low Register [HEX]
>         CH = Cylinder High Register [HEX]
>         DH = Device/Head Register [HEX]
>         DC = Device Command Register [HEX]
>         ER = Error register [HEX]
>         ST = Status register [HEX]
> Timestamp = decimal seconds since the previous disk power-on.
> Note: timestamp "wraps" after 2^32 msec = 49.710 days.
> 
> Error 39 occurred at disk power-on lifetime: 10518 hours
>   When the command that caused the error occurred, the device was in an 
> unknown state.
> 
>   After command completion occurred, registers were:
>   ER ST SC SN CL CH DH
>   -- -- -- -- -- -- --
>   10 59 06 e9 01 0b e0
> 
>   Commands leading to the command that caused the error were:
>   CR FR SC SN CL CH DH DC   Timestamp  Command/Feature_Name
>   -- -- -- -- -- -- -- --   ---------  --------------------
>   c8 00 08 e7 01 0b e0 0b     217.381  READ DMA
>   c8 00 08 0f 02 0b e0 0b     217.381  READ DMA
>   c8 00 08 07 02 0b e0 0b     217.380  READ DMA
>   c8 00 08 47 1a 0b e0 00     217.380  READ DMA
>   c8 00 08 ff 01 0b e0 0b     217.365  READ DMA
> 
> Error 38 occurred at disk power-on lifetime: 10335 hours
>   When the command that caused the error occurred, the device was in an 
> unknown state.
> 
>   After command completion occurred, registers were:
>   ER ST SC SN CL CH DH
>   -- -- -- -- -- -- --
>   40 59 06 e9 01 0b e0
> 
>   Commands leading to the command that caused the error were:
>   CR FR SC SN CL CH DH DC   Timestamp  Command/Feature_Name
>   -- -- -- -- -- -- -- --   ---------  --------------------
>   c8 00 08 e7 01 0b e0 0b      85.432  READ DMA
>   c8 00 08 0f 02 0b e0 0b      85.432  READ DMA
>   c8 00 08 07 02 0b e0 0b      85.431  READ DMA
>   c8 00 08 47 1a 0b e0 00      85.431  READ DMA
>   c8 00 08 ff 01 0b e0 0b      85.423  READ DMA
> 
> Error 37 occurred at disk power-on lifetime: 10215 hours
>   When the command that caused the error occurred, the device was in an 
> unknown state.
> 
>   After command completion occurred, registers were:
>   ER ST SC SN CL CH DH
>   -- -- -- -- -- -- --
>   40 59 06 e9 01 0b e0
> 
>   Commands leading to the command that caused the error were:
>   CR FR SC SN CL CH DH DC   Timestamp  Command/Feature_Name
>   -- -- -- -- -- -- -- --   ---------  --------------------
>   c8 00 08 e7 01 0b e0 0b      59.635  READ DMA
>   ca 00 10 cf 00 60 e0 60      59.635  WRITE DMA
>   c8 00 08 0f 02 0b e0 00      59.634  READ DMA
>   ca 00 10 af 00 60 e0 60      59.634  WRITE DMA
>   c8 00 08 07 02 0b e0 00      59.633  READ DMA
> 
> Error 36 occurred at disk power-on lifetime: 9762 hours
>   When the command that caused the error occurred, the device was in an 
> unknown state.
> 
>   After command completion occurred, registers were:
>   ER ST SC SN CL CH DH
>   -- -- -- -- -- -- --
>   10 59 06 e9 01 0b e0
> 
>   Commands leading to the command that caused the error were:
>   CR FR SC SN CL CH DH DC   Timestamp  Command/Feature_Name
>   -- -- -- -- -- -- -- --   ---------  --------------------
>   c8 00 08 e7 01 0b e0 0b     139.350  READ DMA
>   c8 00 08 0f 02 0b e0 0b     139.350  READ DMA
>   c8 00 08 07 02 0b e0 0b     139.350  READ DMA
>   c8 00 08 47 1a 0b e0 00     139.349  READ DMA
>   c8 00 08 ff 01 0b e0 0b     139.333  READ DMA
> 
> Error 35 occurred at disk power-on lifetime: 9403 hours
>   When the command that caused the error occurred, the device was in an 
> unknown state.
> 
>   After command completion occurred, registers were:
>   ER ST SC SN CL CH DH
>   -- -- -- -- -- -- --
>   40 59 06 e9 01 0b e0
> 
>   Commands leading to the command that caused the error were:
>   CR FR SC SN CL CH DH DC   Timestamp  Command/Feature_Name
>   -- -- -- -- -- -- -- --   ---------  --------------------
>   c8 00 08 e7 01 0b e0 0b     291.066  READ DMA
>   c8 00 08 0f 02 0b e0 0b     291.059  READ DMA
>   c8 00 08 07 02 0b e0 0b     291.058  READ DMA
>   c8 00 08 47 1a 0b e0 00     291.058  READ DMA
>   c8 00 08 ff 01 0b e0 0b     291.042  READ DMA
> 
> SMART Self-test log structure revision number 1
> Num  Test_Description    Status                  Remaining  
> LifeTime(hours) LBA_of_first_error
> # 1  Extended off-line   Completed: read failure       90%     
> 14063         0x000aea3d
> # 2  Short off-line      Completed: read failure       40%     
> 14063         0x000aea3d
> 
> 

-- 
----------------------------------------------------------------------
   Frank Welty         |   Earth is a beta site, I just wish that damn
   lfwelty at nc.rr.com   |   pink elephant would give me my mouse back.
----------------------------------------------------------------------




More information about the TriLUG mailing list