Re: [PATCH] 2.4.18 raid1 - fix SMP locking/interrupt errors, fix

Paul Clements (kernel@steeleye.com)
Mon, 25 Mar 2002 13:52:39 -0500 (EST)


Neil,

Thanks for your feedback. Replies below...

--
Paul Clements
SteelEye Technology
Paul.Clements@SteelEye.com

On Mon, 25 Mar 2002, Neil Brown wrote:

> On Friday March 22, Paul.Clements@SteelEye.com wrote: > > > > The problems are, briefly: > > > > 1) overuse of device_lock spin lock > > > > The device_lock was being used for two separate, unrelated purposes. > > This was causing too much contention and caused a deadlock in one case. > > The device_lock will be split into two separate locks by introducing a > > new spin lock, "memory_lock". > > I can believe that there could be extra contention because of the dual > use of this spin lock. Do you have lockmeter numbers at all?

No, I'm not familiar with that. How do I get those? Is it fairly simple?

I wasn't so much concerned about extra contention as the (in my mind) logical separation of these two different tasks, and the fact that the lack of separation had led to a deadlock.

> However I cannot see how it would cause a deadlock. Could you please > give details?

raid1_diskop() calls close_sync() -- close_sync() schedules itself out to wait for pending I/O to quiesce so that the resync can end... meanwhile #CPUs (in my case, 2) tasks enter into any of the memory (de)allocation routines and spin on the device_lock forever...

> > > > 2) non-atomic memory allocation > > > > Due to use of GFP_KERNEL rather than GFP_ATOMIC, certain threads of the > > raid1 driver were scheduled out while holding a spin lock, causing a > > deadlock to occur. Memory allocation during critical sections where a > > spin lock is held will be changed to atomic allocations. > > You are definately right that we should not be calling kmalloc with a > spinlock held - my bad. > However I don't think your fix is ideal. The relevant code is > "raid1_grow_buffers" which allocates a bunch of buffers and attaches > them to the device structure. > The lock is only realy needed for the attachment. A better fix would > be to build a separate list, and then just claim the lock while > attaching that list to the structure.

Unfortunately, this won't work, because the segment_lock is also held while this code is executing (see raid1_sync_request).

> > > > 3) incorrect enabling/disabling of interrupts during locking > > > > In several cases, the wrong spin_lock* macros were being used. There > > were a few cases where the irqsave/irqrestore versions of the macros > > were needed, but were not used. The symptom of these problems was that > > interrupts were enabled or disabled at inappropriate times, resulting in > > deadlocks. > > I don't believe that this is true. > The save/restore versions are only needed if the code might be called > from interrupt context. However the routines where you made this > change: raid1_grow_buffers, raid1_shrink_buffers, close_sync, > are only ever called from process context, with interrupts enabled. > Or am I missing something?

please see my other e-mail reply to Andrew Morton regarding this...

> > > > 4) incorrect setting of conf->cnt_future and conf->phase resync counters > > > > The symptoms of this problem were that, if I/O was occurring when a > > resync ended (or was aborted), the resync would hang and never complete. > > This eventually would cause all I/O to the md device to hang. > > I'll have to look at this one a bit more closely. I'll let you know > what I think of it.

OK. If you come up with something better, please let me know.

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/