spinlocks() are severely broken in 2.2.X and 2.4.X for modules

Jeff V. Merkey (jmerkey@timpanogas.com)
Fri, 30 Jun 2000 16:31:02 -0600


I have spent the past five and one half weeks chasing down a severe
memory corruption problem in the NWFS LRU. The problem I am reporting
is what has caused ALL the bugs folks have seen over the past couple of
months in NWFS with memory corruption. It's what's held up our next
release by over two months. Despite screaming customers, I have held
off on posting the next release until I understood clearly what was
going on -- now I know -- Linux spinlocks() don't work when used by
kernel modules in linux that include spinlock.h.

NWFS is fully multi-threaded and uses fine grained locking. Much of
Linux is not fine grained, which would explain why I may be one of the
few folks to first see this problem. Initially, I was under the
assumption that the spinlocks() in Linux worked, and because of this,
focused my attention on my own code rather than scrutinize the code in
Linux. I rewrote large sections of NWFS, put in traps, and checks, list
consistency routines, etc. It was not until I put in a check for
multiple users being inside the same lock that I located the problem.
Well, I have completed a very thorough analysis of generated IA32 code,
and I have discovered something rather shocking, which is that the
spinlock code in Linux is severely broken, and this is not due to a
coding error, but a problem with the GCC compiler apparently generating
garbage. There's also several issues with using "lock bts" instead of
"xchg eax, 1", which is the recommended method for implementing
spinlocks() on IA32 intel systems.

What's really scary here is that a great deal of new code has been
written that depends on this spinlock code, and once the spinlock code
gets fixed properly, we may see tons of deadlocks and lockups all over
the place since this code has been there for a very long time.

All the NWFS code that manifests this problem was built with the
following GCC compiler and linker switches:

Compiler Switches
-----------------

-D__SMP__ -DMODVERSIONS -DMODULE -D__KERNEL__ -Wall -Wstrict-prototypes
-O2 -fomit-frame-pointer -fno-strength-reduce -pipe -m386 -DCPU=386
-I/usr/src/linux/include -c

Linker Switches
---------------

-m elf_i386 -Map nwfs.map -r -o

The code fragment in /usr/src/linux/include/asm/spinlock.h 2.2.16 that
generates the spin_lock() code is as follows:

#define spin_lock_string \
"\n1:\t" \
"lock ; btsl $0,%0\n\t" \
"jc 2f\n" \
".section .text.lock,\"ax\"\n" \ // THIS CODING CONSTRUCT IS NOT
WORKING WHEN INCLUDED BY A KERNEL MODULE
"2:\t" \
"testb $1,%0\n\t" \
"jne 2b\n\t" \
"jmp 1b\n" \
".previous"

This macro is clearly trying to generate a code section that a failing
lock attempt will jump to if the carry flag was set because the lock was
already engaged. Unfortunately, for kernel modules that include the
file spinlock.h this is clearly not what happens. The following code
fragment is a call to spin_lock_irqsave() within a function called
NWLockLRU(). As you can see from the code fragment, the lock, upon
seeing the carry flag set because someone else held the lock, jumps into
a completely different function's code and returns as though the user
acquired the lock.

With regard to 2.4.X kernels, the code fragment in
/usr/src/linux/include/asm/spinlock.h 2.4.0-test2 that generates the
spinlock() function code is as follows:

#define spin_lock_string \
"\n1:\t" \
"lock ; decb %0\n\t" \
"js 2f\n" \
".section .text.lock,\"ax\"\n" \ // THIS CODING CONSTRUCT IS NOT
WORKING WHEN INCLUDED BY A KERNEL MODULE
"2:\t" \
"cmpb $0,%0\n\t" \
"rep;nop\n\t" \
"jle 2b\n\t" \
"jmp 1b\n" \
".previous"

We have an identical situation on 2.4.X here in that the spinlock, upon
failing, jumps into an entirely different code section. The function
NWLockLRU() is coded as follows in NWFS:

void NWLockLRU(LRU_HANDLE *lru_handle)
{
spinlock_irqsave(&lru_handle->LRU_spinlock, lru_handle->LRU_flags);
return;
}

what gets generated from spinlock.h and GCC in the 2.4.X case is as
follows:

<NWLockLRU>
0x45a0 mov 0x4(%esp, 1),%edx
0x45a4 pushf
0s45a5 pop %eax
0x45a6 cli
0x45a7 mov %eax,0x80(%edx)
0x45ad lock decb 0x7C(%edx)
0x45b1 js 0x4639 <ReleaseWaiters+49> // if the lock fails it
jumps to this address is in the function ReleaseWaiters() ????
0x45b7 ret

<ReleaseWaiters>
0x4608 push %esi
0x4609 push %ebx
0x460a mov 0xc(%esp,1),%ebx
0x460e cmpl $0x0,0x484(%ebx)
0x4615 je 0x463d <ReleaseWaiters+53>
0x4517 lea 0x4c(%ebx),%esi
0x461a lea 0x0(%esi),%esi
0x4620 push %esi
0x4621 call 0x4622 <ReleaseWaiters+26>
0x4626 mov 0x484(%ebx),%edx
0x462c lea 0xFFFFFFFF(%eax),%edx
0x462f mov %edx,0x484(%ebx)
0x4635 add $0x4,%esp
0x4638 cmp $0x1,%eax
0x463b jne 0x4620 <ReleaseWaiters+24> // the call to
spinlock_irqsave() ends up jumping here, which will exit and allow two
0x463d pop %ebx // processors to acquire
the lock at the same time.
0x463e pop %esi
0x463f ret

This is clearly a severe problem. It is uncertain, however, whether it
is occurring in the kernel proper, or if it only manifests itself when
kernel modules, like file system drivers, include spinlock.h. It's very
possible that a large number of drivers could be broken with this, since
it is gneerated form a macro rather than a kernel function proper that
is linked when a module loads. It appears that any drivers that have
been built using these constructs may see this problem. What the code
is trying to do is not what the compiler is actually outputing into the
executable code.

MANOS/NETWARE SPINLOCK IMPLEMENTATION
--------------------------------------

This is the spinlock code I wrote for MANOS (an open source os like
NetWare) and also the code I wrote for the spinlocks that are used in
NetWare today. I have tested a version of this code with GCC and NWFS
to get around the spinlock() problem in Linux, and it seems to work
correctly. It's provided as an example that uses the [xchg] instruction
instead of "lock bts".

_TEXT SEGMENT PUBLIC USE32 'CODE'

ASSUME cs:_TEXT, ds:_DATA, es:_DATA, fs:_DATA

;
; void spin_lock(ULONG *lock);
;

public spin_lock
spin_lock:
mov ecx, [esp + 4]
xor edx, edx

lock_xchg:
mov eax, 1
xchg eax, dword ptr [ecx]
or eax, eax
jz lock_acquired

spin_test:
inc edx
cmp edx, 1FFFFFFFh
jg deadlock_detected

cmp dword ptr [ecx], 0
jnz spin_test
jmp lock_xchg

lock_acquired:
ret

deadlock_detected:
push OFFSET spin_lock_deadlock
call panic
add esp, 4
xor eax, eax
ret
;
; void spin_unlock(ULONG *lock);
;

public spin_unlock
spin_unlock:
mov eax, dword ptr [esp + 4]
mov dword ptr [eax], 0

spin_unlock_exit:
ret
;
; void spin_try_lock(ULONG *lock);
;

public spin_try_lock
spin_try_lock:
mov ecx, [esp + 4]

cmp dword ptr [ecx], 0
jnz lock_busy

mov eax, 1
xchg eax, dword ptr [ecx]
or eax, eax
jnz lock_busy

lock_obtained:
mov eax, 1
ret

lock_busy:
xor eax, eax
ret

I am at present now using this code (sorry about the MASM'ish structure
-- gcc has problems with it, but TASM does not and can output COFF and
ELF format) since the spinlock code in Linux does not seem to work
properly with external kernel modules. I have not reviewed kernel
binaries images to determine if the kernel proper has this problem, but
would strongly advise someone go and look at it and see.

Please advise.

Jeff Merkey
CEO, TRG

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/