Re: Wow! Is memory ever cheap!

Malcolm Beattie (mbeattie@sable.ox.ac.uk)
Wed, 9 May 2001 10:42:02 +0100


Larry McVoy writes:
> On Wed, May 09, 2001 at 12:24:25AM -0400, Marty Leisner wrote:
> > My understanding is suns big machines stopped using ecc and they
>
> The SUN problem was a cache problem and there is no way that I believe
> that SUN would turn of ECC in the cache. There are good reasons for
> not doing so. If you think through the end to end argument, you will
> see that you have no way to do checks on the data path into/out of the
> processor. If that part of the datapath is not checked then no amount
> of checking elsewhere does any good, the processor can be corrupting
> your data and never know it. If SUN was so stupid as to remove this,
> then it is a dramatically different place. I heard that there was a
> bug in the cache controller, I never heard that they had removed ECC.

There are issues with error detection/correction/recovery with
different designs of L1 and L2 caches. There's a good paper:

IBM S/390 storage hierarchy - G5 and G6 performance considerations
IBM Journal of Research and Development
Vol 43 No. 5/6
available at
http://www.research.ibm.com/journal/rd/435/jackson.html

which covers IBM's choice of L1 and L2 design for S/390. The section on
"S/390 reliability and performance implications" is relevant here. In
particular, they use a solution which isn't best from the performance
point of view but ensures you don't discover "too late" about an error.
I heard a rumour (now I get to the unsubstantiated part :-) that Sun
chose a higher-performing design for their cache subsystem but which has
a nastier failure mode in the case of cache errors.

--Malcolm

-- 
Malcolm Beattie <mbeattie@sable.ox.ac.uk>
Unix Systems Programmer
Oxford University Computing Services
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/