Re: aio-core why not using SuS? [Re: [rfc] aio-core for 2.5.29 (Re: async-io API registration for 2.5.29)]

Andrea Arcangeli (andrea@suse.de)
Fri, 16 Aug 2002 04:32:10 +0200


On Thu, Aug 15, 2002 at 10:00:54PM -0400, Benjamin LaHaise wrote:
> On Fri, Aug 16, 2002 at 03:57:17AM +0200, Andrea Arcangeli wrote:
> > you're saying you prefer glibc to wrap the aio_read/write/fsync and to
> > redirect all them to lio_listio after converting the iocb from user API to
> > kernel API, right? still I don't see why should we have different iocb,
> > I would understsand if you say we should simply overwrite aio_lio_opcode
> > inside the aio_read(3) inside glibc and to pass it over to kernel with a
> > single syscalls if it's low cost to just set the lio_opcode, but having
> > different data structures doesn't sounds the best still. I mean, it
> > would be nicer if things would be more consistent.
>
> The iocb is as minimally different from the posix aio api as possible. The
> main reason for the difference is that struct sigevent is unreasonably huge.
> A lightweight posix aio implementation on top of the kernel API shares the
> fields between the kernel iocb and the posix aiocb.

/* extra parameters */
__u64 aio_reserved2; /* TODO: use this for a (struct sigevent *) */
__u64 aio_reserved3;

so you want the conversion to only store the pointer (if any) for the
sigevent in the iocb, rather than the whole sigevent, right? This is an
argument that has technical sense and that I can happily buy for having
a different iocb. However your argument also depends having I/O
completion notification via signal is the common case or not. I guess in
theory it should be the common case for software designed for best
performance.

>
> > I don't see how the flushing flood is related to this, this is a normal
> > syscall, any issue that applies to these aio_read/write/fsync should
> > apply to all other syscalls too. Also the 4G starvation will be more
> > likely fixed by x86-64 or in software by using a softpagesize larger
> > than 4k so that the mem_map array doesn't load all the zone_normal.
>
> A 4G/4G split flushes the TLB on every syscall.

sure, that's why it's so slow. This applies to
read/writes/exceptions/interrupts and everything else kernel side.

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/