Re: Is this kernel related (signal 11)?

Russell King (rmk@arm.linux.org.uk)
Mon, 22 Jan 2001 21:47:58 +0000 (GMT)


Rogier Wolff writes:
> Harware problems are normally not reproducable. Can you attach a
> debugger to your X server, and catch it when things go bad? (And
> give the Xfree86 people a backtrace)

Bad RAM can be extremely reproducable though, and can certainly produce
SEGVs.

Evidence: I recently had a bad 128MB SDRAM which *always* failed at byte
address 0x220068, which was the middle of the mem_map array. All I
needed to do was 'dd if=/dev/hda of=/dev/null' and the machine would
die within 5 minutes due to an invalid buffer_head pointer.

The SDRAM naturally passed each and every single memory test I could
throw at it. However, a new SDRAM fixed the problem.

It is quite common for SDRAMs to fail in this way - think about the
failure mode. Some of the silicon in the SDRAM is damaged. This isn't
going to move about, so its going to be in a fixed position. A fixed
position means a specific set of transistors, gate, and therefore
memory location.

In answer to the original posters question, the first step would be
to grab a copy of memtest86 (iirc its a program that is run from floppy
disk) and run that on your system. That /should/ (and I stress should
there) detect any RAM problems you have.

--
Russell King (rmk@arm.linux.org.uk)                The developer of ARM Linux
             http://www.arm.linux.org.uk/personal/aboutme.html

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org Please read the FAQ at http://www.tux.org/lkml/