Re: generic rwsem [Re: Alpha "process table hang"]

D.W.Howells (dhowells@astarte.free-online.co.uk)
Tue, 17 Apr 2001 22:48:02 +0100


> I am sure ppc couldn't race (at least unless up_read/up_write were excuted
> from irq/softnet context and that never happens in 2.4.4pre3, see below ;).

This is not actually using the rwsem code I wrote at the moment.

> And incidentally the above is what (I guess Richard) did on the alpha and
> that should really go into common code instead of having
> asm-i386/rwsem-xadd.h asm-alpha/rwsem.h etcc.etc... just implement
> atomic_inc_return using xadd in asm-i386/atomic.h, that's much better
> design IMHO.

I disagree... you want such primitives to be as efficient as possible. The
whole point of having asm/xxxx.h files is that you can stuff them full of
dirty tricks specific to certain architectures.

> That can obviously be done for example with C code like this:
> count = atomic_inc_return(&sem->count);
> if (__builtin_expect(count == 0, 0))
> slow_path()
>
> The above is the perfect C implementation IMHO

But not so efficient since it _has_ to take a jump unless the compiler can
emit code in alternative text sections when it seems appropriate. Plus, you
have to test count's value. XADD sets EFLAGS based on the result in memory,
something that allows all but one fastpath to be two instructions in length
(the one that isn't is three).

But, yes, there should probably be two generic cases: one implemented with a
spinlock in the rwsem struct (as I supply) and one implemented using
atomic_add_return(). Note, however! atomic_add_return() is not necessarily
implemented efficiently: if it involves a cmpxchg loop (as many seem to),
then that is really quite inefficient, and may be better done as a spinlock
anyway.

I've had a look at your implementation... It seems to hold the spinlocks for
an awfully long time... specifically around the local variable initialisation
in the 'failed' functions. Don't forget that the compiler can't reorder these
because they're inside the spinlock. I would, if I were you, fold the
'failed' functions into the main ones to avoid this problem.

Your rw_semaphore structure is also rather large: 46 bytes without debugging
stuff (16 bytes apiece for the waitqueues and 12 bytes for the rest).
Contrast that with mine: generic is 24 bytes and the i386-xadd optimised is
20.

Admittedly, though, yours is extremely simple and easy to follow, but I don't
think it's going to be very fast.

Of course, I still prefer mine... smaller, faster, more efficient:-)

David
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/