Re: pre6 oom killer oops

Jeff Garzik (jgarzik@mandrakesoft.com)
Wed, 31 Oct 2001 09:27:45 -0500


Alexander Viro wrote:
>
> On Wed, 31 Oct 2001, Jeff Garzik wrote:
>
> > further comments #2:
> >
> > when rebooting, there was some disk corruption in the ext2 filesystem.
> >
> > It is my guess that this is to the large number of buffers in the vmstat
> > output, which I believe are dirty buffers that never got written out
>
> Judging by your log it's not an OOM - page table corruption got caught by
> do_wp_page(), which means that handle_mm_fault() fails (surprise, surprise),
> which kills the process.
>
> Looks like a massive memory corruption - later it fscked you in pte_alloc()
> and then it screwed buffer cache lists.

I'm reinstalling now, with a bad blocks check, to make sure random disk
crap isn't affecting things. Disk is good AFAIK. Vaguely recent ATA-33
drive. I'll switch to the other alpha to see if I can see similar
symptoms.

Unfortunately I don't know of a good alpha memory tester like
memtest86. SRM firmware tests memory ok, but that probably doesn't mean
much.

Final comment before leaving this machine. Restarting the rpm-rebuilder
(post reboot and fsck), I still see a very large number of buffs. Since
there is zero swapped, this may or may not be normal. The cache value
appears sane, FWIW.

procs memory swap io
system cpu
r b w swpd free buff cache si so bi bo in cs us
sy id
2 0 0 0 35024 13016 267256 0 0 3 491 1046 136 84
12 4
1 0 0 0 106776 14864 191984 0 0 380 0 1903 696 13
20 67
3 0 0 0 85448 16424 210216 0 0 239 11 1066 1305 58
25 17
1 0 0 0 83624 16560 210712 0 0 107 21 1068 550 54
25 21
1 0 0 0 81952 16600 211224 0 0 3 0 1030 96
93 6 0
1 0 0 0 78720 16704 212080 0 0 144 11 1045 118 79
12 9
2 0 0 0 79608 16728 212400 0 0 30 0 1037 108
86 8 5
1 0 0 0 73288 16792 212736 0 0 35 11 1039 84
91 5 4
1 0 0 0 71784 16792 212736 0 0 0 11 1038 25
99 1 0
1 0 0 0 81560 16944 214840 0 0 446 0 1049 259 70
19 11
1 0 0 0 76920 16984 215480 0 0 4 0 1031 487 75
24 1
1 0 0 0 79384 17016 215600 0 0 0 0 1029 23
94 6 0
0 1 1 0 79432 17056 215976 0 0 12 1396 1132 188 48
12 40
0 1 1 0 79384 17056 216008 0 0 11 4542 1107 14
0 2 98
1 0 0 0 74400 17128 216984 0 0 331 217 1090 269 25
10 65
2 0 0 0 70456 17136 217064 0 0 0 288 1030 119
96 4 0
1 0 0 0 72200 17152 217176 0 0 0 0 1050 96
97 3 0
1 0 0 0 67232 17152 217264 0 0 0 107 1041 93
98 2 0
1 0 0 0 70072 17160 217416 0 0 0 0 1029 91
97 3 0
1 0 0 0 68272 17168 217496 0 0 0 75 1035 122
96 4 0

-- 
Jeff Garzik      | Only so many songs can be sung
Building 1024    | with two lips, two lungs, and one tongue.
MandrakeSoft     |         - nomeansno

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/