Re: [RFC] adding aio_readv/writev

Shailabh Nagar (nagar@watson.ibm.com)
Mon, 23 Sep 2002 10:30:05 -0400


Stephen Hemminger wrote:

>Why not batch up multiple requests with one io_submit? It has the same
>effect, except there would be multiple responses.
>
Even though the multiple iocb's enter the kernel together, they still
get processed individually so a fair amount of unnecessary data
transmission and function invocation are still occurring in the submit
code path.
Depending on how long it takes for io_submit_one to return, there might
be a reduced probability for merging of io requests at the i/o scheduler.
Finally, the multiple responses need to be handled as you mentioned. I
suppose the application could wait for the last request (in the
io_submit list) and that would most probably ensure that the preceding
ones were complete as well but its not a guarantee offered by the aio
API, right ?
Besides, the application needs the data (represented by multiple
requests) at one go so partial completion isn't likely to be useful and
will only be an overhead.

While a quantitative assessment of the above tradeoffs is possible, it
will be difficult to make a good comparison before "true" aio
functionality is in place for 2.5. Such an assessment is unlikely to
happen before the feature freeze takes effect. So I'm making a case for
putting in async vector I/O interfaces in for the following three reasons:
- the synchronous API does provide separate entry points for vector I/O.
Extending the same to the async interfaces, especially when it doesn't
even involve creating new syscalls, seems natural for completeness.
- underlying in-kernel infrastructure already supports it, so no major
changes are needed.
- there exists atleast one major application class (databases) that uses
vectored I/O heavily and benefits from async I/O. Hence async vectored
I/O is also likely to be useful. Can anyone else with experience on
other OS's comment on this ?

Comments, reasons for not doing async readv/writev directly welcome.

- Shailabh

>
>
>On Fri, 2002-09-20 at 13:39, Shailabh Nagar wrote:
>
>>Ben,
>>
>>Currently there is no way to initiate an aio readv/writev in 2.5. There
>>were no aio_readv/writev calls in 2.4 either - I'm wondering if there
>>was any particular reason for excluding readv/writev operations from aio ?
>>
>>The read/readv paths have anyway been merged for raw/O_DIRECT and
>>regular file read/writes. So why not expose the vector read/write to the
>>user by adding the IOCB_CMD_PREADV/IOCB_CMD_READV and
>>IOCB_CMD_PWRITEV/IOCB_CMD_WRITEV commands to the aio set. Without that,
>>raw/O_DIRECT readv users would need to unnecessarily cycle through their
>>iovecs at a library level submitting them individually.
>>For larger iovecs, user/library code would needlessly deal with multiple
>>completions. While I'm not sure of the performance impact of the absence
>>of aio_readv/writev, it seems easy enough to provide.
>>Most of the functions are already in place. We would only
>>need a way to pass the iovec through the iocb.
>>
>>I was thinking of something like this:
>>
>>struct iocb {
>>
>>+union {
>> __u64 aio_buf
>>+ __u64 aio_iovp
>>+}
>>+union {
>> __u64 aio_nbytes
>>+ __u64 aio_nsegs
>>+}
>>
>>allowing the iovec * & nsegs to be passed into sys_io_submit. Some code
>>would be added (within case handling of IOCB_CMD_READV within
>>io_submit_one) to copy & verify the iovec pointers and then call
>>aio_readv/aio_writev (if its defined for the fs).
>>
>>What do you think ? I wanted to get some feedback before trying to code
>>this up.
>>
>>While we are on the topic of expanding aio operations, what about
>>providing IOCB_CMD_READ/WRITE, distinct from their pread/pwrite
>>counterparts ? Do you think thats needed ?
>>
>>- Shailabh
>>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/