Re: oops on 2.4.13-pre5 in prune_dcache

Linus Torvalds (torvalds@transmeta.com)
Tue, 30 Oct 2001 18:49:23 -0800 (PST)


On Wed, 31 Oct 2001, Andrea Arcangeli wrote:
>
> Dunno why, but usually all bitflips triggers during heavy list walking,
> so it's not too much surprisingly. I recall the most frequent bitflips
> were happening walking the buffer header lists in 2.2, but I recall
> dcache walks also oopsing due bitflips in 2.2.

I suspect it's just that the list walking is (a) the operation that
touches the most uncached memory and (b) also is inherently the one that,
through pointer following, ends up showing the effects of flipped bits the
most as oopses.

For example, most of the actual _memory_ is obviously in the data caches
or user space pages, but if those are corrupt you'd never see an oops.
You'd see filesystem corruption (and see the reports of changing md5sums
about how this does happen), or you'd see strange SIGSEGV's in user space.

In contrast, a kernel oops is almost always accompanied by pointer
dereferencing, and the thing that dereferences the most pointers is
obviously pointer chasing - ie list walking.

So I don't think that this has anything to do with "certain lists are more
likely to get into trouble", but more of a "certain lists have long chains
of pointers, and as such they are more likely to show up in oopses".

Self-selection, in short - the bane of all statistics gathering.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/