Re: 2.5.46-mm2

Jens Axboe (axboe@suse.de)
Mon, 11 Nov 2002 08:04:00 +0100


On Sun, Nov 10 2002, Andrew Morton wrote:
> Jens Axboe wrote:
> >
> > Complete diff against 2.5.46-BK current attached.
> >
> > [rbtree IO scheduler.. ]
> >
>
> Well it's nice to see 250 megabytes under writeback to a single
> queue, but this could become a bit of a problem:
>
> blkdev_requests: 5760KB 5767KB 99.86
>
> (gdb) p sizeof(struct request)
> $1 = 136
>
> On a Pentium 4, that will be rounded up to 256 bytes, so it'd be
> up around 10 megabytes. Now imagine 200 disks....

Yes I'm well aware of this problem...

> Putting struct request on a diet would help - get it under 128
> bytes. Also, I'd suggest that SLAB_HWCACHE_ALIGN not be used in
> the blkdev_requests slab.

I can probably trim it to under 128 bytes.

> But we'll still have a problem with hundreds of disks.
>
> Of course, I'm assuming that there's a significant benefit
> in having 1024 requests. Perhaps there isn't, so we don't have
> a problem. But if it _is_ good (and for some things it will be),
> we need to do something sterner.
>
> It is hard to justify 1024 read requests, so the read pool and the
> write pool could be split up. 128 read requests, 1024 write requests.

Agree, at least to the idea of making reads and writes different sized
pools. The exact split is harder.

> Still not enough.
>
> Now, 1024 requests is enough to put up to 512 megabytes of memory
> under I/O. Which is cool, but it is unlikely that anyone would want
> to put 512 megs under IO against 200 disks at the same time.
>
> Which means that we need a common pool. Maybe a fixed pool of 128
> requests per queue, then any additional requests should be dynamically
> allocated on a best-effort basis. A mempool-per-queue would do that
> quite neatly. (Using GFP_ATOMIC & ~_GFP_HIGH). The throttling code
> would need some rework.
>
> All of which is a bit of a hassle. I'll do an mm3 later today which
> actually has the damn code in it and let's get in and find out whether
> the huge queue is worth pursuing.

I've already done exactly this (mempool per queue, global slab). I'll
share it later today.

But yes, lets see some numbers on huge queues first. Otherwise we can
just fall back to using a decent 128/512 split for reads/writes, or
whatever is a good split.

-- 
Jens Axboe

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/