Re: RAID question

Lionel Bouton (Lionel.Bouton@free.fr)
Wed, 07 Nov 2001 23:09:16 +0100


Bene, Martin wrote:

>Hi Roy,
>
>>raid5: measuring checksumming speed
>> 8regs : 1480.800 MB/sec
>> 32regs : 711.200 MB/sec
>> pIII_sse : 1570.400 MB/sec
>> pII_mmx : 1787.200 MB/sec
>> p5_mmx : 1904.000 MB/sec
>>raid5: using function: pIII_sse (1570.400 MB/sec)
>>
>>Why is raid5 using function pIII_sse when p5_MMX is way faster?
>>
>
>The sse version is prefered over the others and gets used regardless of
>speed if it's available:
>
>/* We force the use of the SSE xor block because it can write around L2.
> We may also be able to load into the L1 only depending on how the cpu
> deals with a load to a line that is being prefetched. */
>
As your cpu(s) work on the data before sending it back to the devices in
the case of RAID5 and as in the general case memory writes invalidate an
amount of L2 cache corresponding to the amount of data to write, you'll
push out of L2 data of your running applications, hence slowing them down.

Imagine you have a block you want to write to/read from your 4k chunk
size, 3 drive RAID 5 array.
For an atomic write, you'll read 8k from memory and write 12k back. It's
20k of L2 cache involved. The 8k read may already be in cache (you write
to disk what you worked on earlier) but the 12k are generated so they
surely won't.
On x86 12-20k of L2 cache is a substansial amount of cache. And it's the
lowest atomic read/write operation you can imagine, with 64kb chunck
size and 5 drives it would be 64 x 5 = 320 to 64 x 9 = 576k -> whole L2
invalidated on most modern x86 cpu.

What does cache give you ? Fast access to recently used memory and
write-back capability.
Cache use in this case is *bad* because you won't reuse the cached data.
Write-back won't give much (you stream whole chunks down the memory bus
anyway).
You *may* slow down disk writes (if you have something near 2GB/s of
disk bandwidth on your RAID...) but you'll *surely* slow down your
applications a lot by taking away their data from your L2 cache(s).

So the raid code doesn't use the optimal disk throughput method, it uses
the optimal system performance method...

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/