Re: O_DIRECT! or O_DIRECT?

Stephen C. Tweedie (sct@redhat.com)
Wed, 4 Jul 2001 18:52:30 +0100


Hi,

On Wed, Jul 04, 2001 at 12:34:35AM +0400, Samium Gromoff wrote:
>
> This is interesting, because one real advantage
> of O_DIRECT are these greased weasel fast 15-20 Mb/s
> file copies, which ones makes windoze users to look
> on us as on lesser beings.

Not true.

O_DIRECT does not speed up sequential file accesses. If anything, it
may well slow them down, especially for writes. What O_DIRECT does is
twofold --- it guarantees physical IO to the disk (so that you know
for sure that the data is on disk for writes, or that the data on disk
is readable for reads); and it avoids the memory and CPU overhead of
keeping any cached copy of the data.

But because O_DIRECT is completely synchronous, it's not possible for
the kernel to implement its normal readahead and writebehind IO
clustering for direct IO. If you use the normal approach of writing
4k at a time to an O_DIRECT file, things may well be *massively*
slower than usual because the kernel is sending individual 4k IOs to
the disk, and because it is waiting for each IO to complete before the
application provides the next one.

On the contrary, buffered writes allow the kernel to batch those 4k
writes into large disk IOs, perhaps 100k or more; and the kernel can
maintain a queue of more than one such IO, so that once the first IO
completes the next one is immediately ready to be sent out.

For these reasons, buffered IO is often faster than O_DIRECT for pure
sequential access. The downside it its greater CPU cost and the fact
that it pollutes the cache (which, in turn, causes even _more_ CPU
overhead when the VM is forced to start reclaiming old cache data to
make room for new blocks.)

O_DIRECT is great for cases like multimedia (where you want to
maximise CPU available to the application and where you know in
advance that the data is unlikely to fit in cache) and databases
(where the application is caching things already and extra copies in
memory are just a waste of memory). It is not an automatic win for
all applications.

Cheers,
Stephen
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/