Re: 2.5.2-pre1 dbench 32 hangs in vmstat "b" state

Jens Axboe (axboe@suse.de)
Sat, 29 Dec 2001 18:48:37 +0100


On Sat, Dec 29 2001, Jens Axboe wrote:
> On Sat, Dec 29 2001, rwhron@earthlink.net wrote:
> > > > Kernel panic: Out of memory and no killable processes...
> > >
> > > Someone else did report a similar case. Very strange, doesn't look bio
> >
> > Al Viro posted a fix:
> > http://marc.theaimsgroup.com/?l=linux-kernel&m=100959128922157&w=2
> >
> > I used Al's patch and 2.5.2-pre3 boots with reiserfs root_fs
> > and no panic.
> >
> > Below is the trace on 2.5.2-pre3 after dbench 32 livelocked.
>
> Thanks, could you try with this patch? It's not a fix (haven't found the
> bug yet), but I think we are looking at list corruption so please check
> if this patch at least alters when it hangs etc.

Ah I think I got it -- appears to be down to no rechecking for empty
queue after a potential queue_lock droppage (busy I/O, no request left
get_request returns NULL, drop lock and run get_request_wait). This
explains the get_request_wait deadlock, compiling right now...

--- /opt/kernel/linux-2.5.2-pre3/drivers/block/ll_rw_blk.c Sat Dec 29 12:17:53 2001
+++ drivers/block/ll_rw_blk.c Sat Dec 29 12:45:04 2001
@@ -881,7 +881,9 @@

BUG_ON(rw != READ && rw != WRITE);

+ spin_lock_irq(q->queue_lock);
rq = get_request(q, rw);
+ spin_unlock_irq(q->queue_lock);

if (!rq && (gfp_mask & __GFP_WAIT))
rq = get_request_wait(q, rw);
@@ -1081,7 +1083,7 @@
{
struct request *req, *freereq = NULL;
int el_ret, latency = 0, rw, nr_sectors, cur_nr_sectors, barrier;
- struct list_head *insert_here = &q->queue_head;
+ struct list_head *insert_here;
elevator_t *elevator = &q->elevator;
sector_t sector;

@@ -1103,15 +1105,14 @@
barrier = test_bit(BIO_RW_BARRIER, &bio->bi_rw);

spin_lock_irq(q->queue_lock);
+again:
+ req = NULL;
+ insert_here = q->queue_head.prev;

if (blk_queue_empty(q) || barrier) {
blk_plug_device(q);
goto get_rq;
}
-
-again:
- req = NULL;
- insert_here = q->queue_head.prev;

el_ret = elevator->elevator_merge_fn(q, &req, bio);
switch (el_ret) {

-- 
Jens Axboe

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/