Re: O_DIRECT read and holes in 2.5.26

Andrew Morton (akpm@zip.com.au)
Sun, 21 Jul 2002 21:30:06 -0700


Anton Altaparmakov wrote:
>
> Hi Andrew,
>
> At 03:26 22/07/02, Andrew Morton wrote:
> >Stephen Lord wrote:
> > > Did you realize that the new O_DIRECT code in 2.5 cannot read over holes
> > > in a file.
> >
> >Well that was intentional, although I confess to not having
> >put a lot of thought into the decision. The user wants
> >O_DIRECT and we cannot do that. The CPU has to clear the
> >memory by hand. Bad.
> >
> >Obviously it's easy enough to put in the code to clear the
> >memory out. Do you think that should be done?
>
> I would vote for yes because whether a file is sparse or not cannot be
> trivially controlled by the file creator (unless you actually write the
> whole file with non-zero content) but is dependent on the underlying file
> system...

O_DIRECT tends to be used in specialised applications. They're
being silly if they're reading from holes.

> And on the grounds that a memset() is going to be a lot faster
> than a read from device I don't see why it shouldn't be done.

It's actually tons more expensive - wiping the page by hand
goes against the whole reason for using O_DIRECT.

But it is the expected behaviour of the read() system call
so yeah, I'll do it.

> ...
> [ get_blocks() stuff ]
>

This is going to be fairly involved. Probably the top-level
IO code gets ripped up and redone to expect a get_blocks()
interface. A default implementation of get_blocks() would
be provided for naive filesystems - it just calls get_block()
a lot. Quite possibly, we say to heck with purity and get_block()
and get_blocks() become a_ops, too.

I'd really like to see a solid reason for doing all this.
That means numbers ;) Even good old ext2 would be able to show
benefit from batching up the get_block() work, but not a lot.

You won't buy much on writes, I expect - 8k write requests
will probably remain the ideal size.

-
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/