Re: Is Swapping on software RAID1 possible in linux 2.4 ?

Andrew Morton (andrewm@uow.edu.au)
Thu, 12 Jul 2001 14:53:11 +1000


Neil Brown wrote:
>
> --- drivers/md/raid1.c 2001/07/12 02:00:35 1.1
> +++ drivers/md/raid1.c 2001/07/12 02:01:42
> @@ -83,6 +83,7 @@
> cnt--;
> } else {
> PRINTK("raid1: waiting for %d bh\n", cnt);
> + run_task_queue(&tq_disk);
> wait_event(conf->wait_buffer, conf->freebh_cnt >= cnt);
> }
> }
> @@ -170,6 +171,7 @@
> memset(r1_bh, 0, sizeof(*r1_bh));
> return r1_bh;
> }
> + run_task_queue(&tq_disk);
> wait_event(conf->wait_buffer, conf->freer1);
> } while (1);
> }
>
> This is needed anyway to be "correct", as you should always unplug
> the queues before waiting for IO to complete.

The problem with this approach is the waitqueue - you get several
tasks on the waitqueue, and bdflush loses the race - some other
thread steals the r1bh and bdflush goes back to sleep.

Replacing the wait_event() with a special raid1_wait_event()
which unplugs *each time* the caller is woken does help - but
it is still easy to deadlock the system.

Clearly this approach is racy: it assumes that the reserved buffers have
actually been submitted when we unplug - they may not yet have been.
But the lockup is too easy to trigger for that to be a satisfactory
explanation.

The most effective, aggressive, successful and grotty fix for this
problem is to remove the wait_event altogether and replace it with:

run_task_queue(tq_disk);
current->policy |= SCHED_YIELD;
__set_current_state(TASK_RUNNING);
schedule();

This can still deadlock in bad OOM situations, but I think we're
dead anyway. A combination of this approach plus the PF_FLUSH
reservations would work even better, but I found the PF_FLUSH
stuff was sufficient.

> Mind you, if I was really serious about being
> gentle on the memory allocation, I would use
> kmem_cache_alloc(bh_cachep,SLAB_whatever)
> instead of
> kmalloc(sizeof(struct buffer_head), GFP_whatever)

get/put_unused_buffer_head() should be exported API functions.

-
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/