Re: what's the semaphore in requests for?

Peter T. Breuer (ptb@it.uc3m.es)
Tue, 31 Jul 2001 20:45:53 +0200 (MET DST)


"ptb wrote:"
> "Jens Axboe wrote:"
> > > > The reason I ask is that I've been chasing an smp bug in a block driver
> > > > code under 2.2.18) and only with smp ("nosmp" squashes it). It only
> > > 2 processors + 1 userspace helper daemon on device = no bug
> > > 2 processors + 2 userspace helper daemon on device = bug (lockup)
> > > 1 processors + 1 userspace helper daemon on device = no bug
> > > 1 processors + 2 userspace helper daemon on device = no bug
> > And I'll restate here what I said then too -- SHOW THE CODE! Or send me
> > a crystal ball and I'll be happy to solve your races for you.

Let me try this question:

Can the device request function be called from an interrupt?

(and is this newish?). I'm talking about when the plug is released.

All would be explained if the private spinlock were taken by the
request function on an interrupt when it was already held by the
ioctl functions that the userspace daemons use to transfer data
from the private queue up to userspace and back.

I thought the request function ran as requests were added to the
queue, which comes from pressure from the block layer.

> do_request(request_queue_t * q)
> {
> struct dev_request *req;
>
> while (!QUEUE_EMPTY) {
> struct mydevice *dev;
>
> req = CURRENT;
> dev = &dev_array[MINOR(req->rq_dev) >> SHIFT];
> blkdev_dequeue_request(req);
> write_lock(&dev->queue_spinlock);
> // transfer req to the private queue
> list_add(&req->queue, &dev->queue);
> write_unlock(&dev->queue_spinlock);
> // notify listeners
> wake_up_interruptible(& dev->wq);
> }
> }

I now think the request function runs with interrupts disabled locally,
so the raw spinlock access is OK here. But it wasn't ok in the
ioctl functions? ...

> int
> get_req (struct slot *slot, char *buffer)
> {
> struct dev_request request;
> struct request *req;
> int result = 0;
> unsigned start_time = jiffies;
> struct mydevice *dev = slot->dev;
> unsigned timeout = dev->req_timeo * HZ;
> extern struct timezone sys_tz;
>
> down (&dev->queue_lock);
> read_lock(&dev->queue_spinlock);
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

since maybe the request function could have run in the context of an
interrupt while this spinlock was held, which deadlocks the cpu?

But then why doesn't the problem show itself with just one daemon
running on a 2cpu machine?

I've made more tests, and using irqsave on the private spinlock everywhere
seems to cure all ills. But I'm still very hazy as to what is going on.

Peter
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/