Re: O_DIRECT foolish question

Andrew Morton (akpm@digeo.com)
Wed, 12 Feb 2003 21:12:21 -0800


Bruno Diniz de Paula <diniz@cs.rutgers.edu> wrote:
>
> On Wed, 2003-02-12 at 19:13, Chris Wedgwood wrote:
> > If I had to guess, write should work more or less the same as reads
> > (ie. I should be able to write aligned-but-smaller-than-page-sized
> > blocks to the end of files).
> >
> > Testing this however shows this is *not* the case.
>
> This is not the case, I have also tested here and the file written has
> n*block_size always. The problem with writing is that we can't sign to
> the kernel that the actual data has finished and from that point on it
> should zero-fill the bytes. And what is worse, the information about the
> actual size is lost, since the write syscall will store what is passed
> on the 3rd argument in the inode (field st_size of stat). This means
> that after writing using O_DIRECT we can't read data correctly anymore.
> The exception is when we write together with the data information about
> the actual size and process disregarding information from stat, for
> instance.
>
> Well, I am sure I am completely wrong because this doesn't make any
> sense for me. Someone that has already dealt with this and can bring a
> light to the discussion?
>

For writes, I don't think it is reasonable for the kernel to be have to
handle byte-granular appends. O_DIRECT is different. For this case the
application should ftruncate the file back to the desired size prior to
closing it.

For the short reads at EOF, the 2.4 kernel refuses to read anything, and
returns zero. The 2.5 kernel will return -EINVAL, which is better behaviour
(shouldn't make it just look like the file is shorter than it really is).

The ideal behaviour is that which I mistakenly described previously: we
should fill with zeroes and return the partial result. I'll look at
converting 2.5 to do that. As long as the changes are small - the direct-io
code does a ton of stuff, is complex, is not tested a lot and breakage tends
to be subtle.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/