1. Weak Power supply
2. tempeature and heat
3. vibration
Although, the maxtor 160 GB drives do not show such problems at all. Such
problems can be eliminated though. From the SMART data, get the bad sectors
and remap them by writing to the raw device. Those pending sectors will get
remapped. However, the problems will persist with these drives. In our
boxes, the operating temperature is abt 55 C ...
-----Original Message-----
From: Steven Timm [mailto:timm@fnal.gov]
Sent: Tuesday, November 19, 2002 1:37 PM
To: linux-kernel@vger.kernel.org
Subject: AMD 760MPX dma_intr: error=0x40 { UncorrectableError }
I have recently observed a large frequency of this error on
a bunch of compute servers with brand new disks.
Nov 15 01:42:52 fnd0172 kernel: hdb: dma_intr: status=0x51 { DriveReady
SeekComplete Error }
Nov 15 01:42:52 fnd0172 kernel: hdb: dma_intr: error=0x40 {
UncorrectableError }, LBAsect=44763517, sector=11235856
Nov 15 01:42:52 fnd0172 kernel: end_request: I/O error, dev 03:42 (hdb),
sector 11235856
Configuration is the following:
Tyan 2466 motherboard which has AMD760MPX chipset, dual Athlon MP2000+
processors  (supports UltraATA100)
hda=Seagate ST340016A 40 GB drive, ext2 FS
hdb=Seagate ST380021A 80 GB drive, ext2 FS.
There are many entries in this mailing list saying that
the above error is a sign of a bad disk.  Seagate diagnostics
say so too.. It is just hard to believe that 30 hard drives could
go bad in less than a month.
I know errors of this type were common on machines with Serverworks
OSB4 chipsets.  Has anyone else heard of this error happening on
non-serverworks chipsets such as VIA or AMD?  And is the drive
really bad or will a low level format clear the bad blocks
and let the drive operate again?
Steve Timm
------------------------------------------------------------------
SMART shows the following error structure:
SMART Error Log:
SMART Error Logging Version: 1
Error Log Data Structure Pointer: 03
ATA Error Count: 13
Non-Fatal Count: 0
Error Log Structure 1:
DCR   FR   SC   SN   CL   SH   D/H   CR   Timestamp
 00   00   08   57   09   ab    f2   c8     40315
 00   00   08   5f   09   ab    f2   c8     40315
 00   00   08   67   09   ab    f2   c8     40315
 00   00   08   6f   09   ab    f2   c8     40315
 00   00   08   77   09   ab    f2   c8     40315
 00   40   00   7d   09   ab    f2   51     922746
Error condition:  33    Error State:       3
Number of Hours in Drive Life: 1021 (life of the drive in hours)
Error Log Structure 2:
DCR   FR   SC   SN   CL   SH   D/H   CR   Timestamp
 00   00   08   07   d5   55    f1   ca     40320
 00   00   08   3f   00   5c    f1   ca     40320
 00   00   08   97   33   5d    f1   ca     40320
 00   00   08   87   97   0f    f2   ca     40320
 00   00   08   77   09   ab    f2   c8     40320
 00   40   00   7d   09   ab    f2   51     922746
Error condition:  33    Error State:       3
Number of Hours in Drive Life: 1021 (life of the drive in hours)
Error Log Structure 3:
DCR   FR   SC   SN   CL   SH   D/H   CR   Timestamp
 00   00   28   bf   8f   52    f1   c8     23662
 00   00   98   e7   8f   52    f1   c8     23662
 00   00   68   ff   9a   52    f1   c8     23662
 00   00   d8   67   9b   52    f1   c8     23662
 00   00   28   07   a3   52    f1   c8     23662
 00   40   00   25   a3   52    f1   51     1124073
Error condition: 161    Error State:       3
Number of Hours in Drive Life: 1040 (life of the drive in hours)
Error Log Structure 4:
DCR   FR   SC   SN   CL   SH   D/H   CR   Timestamp
 00   00   e0   4f   09   ab    f2   c8     40280
 00   00   d8   57   09   ab    f2   c8     40285
 00   00   d0   5f   09   ab    f2   c8     40290
 00   00   c8   67   09   ab    f2   c8     40296
 00   00   c0   6f   09   ab    f2   c8     40301
 00   40   00   7d   09   ab    f2   51     922746
Error condition:  33    Error State:       3
Number of Hours in Drive Life: 1021 (life of the drive in hours)
Error Log Structure 5:
DCR   FR   SC   SN   CL   SH   D/H   CR   Timestamp
 00   00   d8   57   09   ab    f2   c8     40285
 00   00   d0   5f   09   ab    f2   c8     40290
 00   00   c8   67   09   ab    f2   c8     40296
 00   00   c0   6f   09   ab    f2   c8     40301
 00   00   b8   77   09   ab    f2   c8     40306
 00   40   00   7d   09   ab    f2   51     922746
Error condition:  33    Error State:       3
Number of Hours in Drive Life: 1021 (life of the drive in hours)
Steven C. Timm (630) 840-8525  timm@fnal.gov  http://home.fnal.gov/~timm/
Fermilab Computing Division/Operating Systems Support
Scientific Computing Support Group--Computing Farms Operations
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/