I would like to report an NFS oops on one of our machines, possibly
involving lockd.  It is a stock Redhat 7.3 installation running 2.4.18-2 out
of the box.  The machine is exporting user home directories for between 50
to 100 unix clients.  The clients are a mix of Digital unix boxes (v4.0d or
later) and various Linux versions (Redhat 6.1/7.1/7.2/7.3, Slackware 8.0).
Initially things work fine, but after some time (or perhaps after a certain
number of connections and/or NFS operations) the following oops occurs:
Jun 24 15:22:02 mercury kernel: Unable to handle kernel paging request at virtual address 938e38a4
Jun 24 15:22:02 mercury kernel:  printing eip:
Jun 24 15:22:02 mercury kernel: d8911bf2
Jun 24 15:22:02 mercury kernel: *pde = 00000000
Jun 24 15:22:02 mercury kernel: Oops: 0002
Jun 24 15:22:02 mercury kernel: via82cxxx_audio uart401 ac97_codec sound soundcore binfmt_misc autofs nfs nfsd
Jun 24 15:22:02 mercury kernel: CPU:    0
Jun 24 15:22:02 mercury kernel: EIP:    0010:[<d8911bf2>]    Not tainted
Jun 24 15:22:02 mercury kernel: EFLAGS: 00010212
Jun 24 15:22:02 mercury kernel: 
Jun 24 15:22:02 mercury kernel: EIP is at xdr_encode_netobj_R29c6f164 [sunrpc] 0x12 (2.4.18-3)
Jun 24 15:22:02 mercury kernel: eax: 306db20d   ebx: d4e8306c   ecx: d1d77070   edx: d4e83064
Jun 24 15:22:02 mercury kernel: esi: d4e83008   edi: d4e83008   ebp: d1d77070   esp: d790fda8
Jun 24 15:22:02 mercury kernel: ds: 0018   es: 0018   ss: 0018
Jun 24 15:22:02 mercury kernel: Process lockd (pid: 667, stackpage=d790f000)
Jun 24 15:22:02 mercury kernel: Stack: d4e83008 d4e83008 d8922f7b d1d77070 d4e83064 d1d7701c c6a740c0 d4e8306c 
Jun 24 15:22:02 mercury kernel:        c6a740c0 d8923920 c6a74cc0 d890e879 c89d805c c4fc2280 d8923920 c6a74cc0 
Jun 24 15:22:02 mercury kernel:        d8923934 d1d7705c d4e83008 c89d805c d89098e0 c89d805c d1d7705c d4e83008 
Jun 24 15:22:02 mercury kernel: Call Trace: [<d8922f7b>] nlm4_encode_testres [lockd] 0x8b 
Jun 24 15:22:02 mercury kernel: [<d8923920>] nlm4clt_encode_testres [lockd] 0x0 
Jun 24 15:22:02 mercury kernel: [<d890e879>] rpcauth_marshcred [sunrpc] 0x49 
Jun 24 15:22:02 mercury kernel: [<d8923920>] nlm4clt_encode_testres [lockd] 0x0 
Jun 24 15:22:02 mercury kernel: [<d8923934>] nlm4clt_encode_testres [lockd] 0x14 
Jun 24 15:22:02 mercury kernel: [<d89098e0>] call_encode [sunrpc] 0xd0 
Jun 24 15:22:02 mercury kernel: [<d890cf79>] __rpc_execute [sunrpc] 0xa9 
Jun 24 15:22:02 mercury kernel: [<d8909566>] rpc_call_setup_R0816cf16 [sunrpc] 0x46 
Jun 24 15:22:02 mercury kernel: [<d89094f7>] rpc_call_async_Rf292dde9 [sunrpc] 0x77 
Jun 24 15:22:02 mercury kernel: [<d891db9a>] nlmsvc_async_call [lockd] 0x7a 
Jun 24 15:22:02 mercury kernel: [<d8924330>] nlm4svc_callback_exit [lockd] 0x0 
Jun 24 15:22:02 mercury kernel: [<d8924303>] nlm4svc_callback [lockd] 0x73 
Jun 24 15:22:02 mercury kernel: [<d8924330>] nlm4svc_callback_exit [lockd] 0x0 
Jun 24 15:22:02 mercury kernel: [<d8923e44>] nlm4svc_proc_test_msg [lockd] 0x44 
Jun 24 15:22:02 mercury kernel: [<c014a5c1>] posix_lock_file [kernel] 0x551 
Jun 24 15:22:02 mercury kernel: [<c014a5c1>] posix_lock_file [kernel] 0x551 
Jun 24 15:22:02 mercury kernel: [<c01c86ac>] skb_checksum [kernel] 0x4c 
Jun 24 15:22:02 mercury kernel: [<d8922ca7>] nlm4_decode_lock [lockd] 0x47 
Jun 24 15:22:02 mercury kernel: [<d8922cbc>] nlm4_decode_lock [lockd] 0x5c 
Jun 24 15:22:02 mercury kernel: [<d892318f>] nlm4svc_decode_testargs [lockd] 0x2f 
Jun 24 15:22:02 mercury kernel: [<d892a494>] nlmsvc_procedures4 [lockd] 0xc0 
Jun 24 15:22:02 mercury kernel: [<d890f606>] svc_process_Re3483a09 [sunrpc] 0x2c6 
Jun 24 15:22:02 mercury kernel: [<d8929bc0>] nlmsvc_version4 [lockd] 0x0 
Jun 24 15:22:02 mercury kernel: [<d8929be4>] nlmsvc_program [lockd] 0x0 
Jun 24 15:22:02 mercury kernel: [<d891eccd>] lockd [lockd] 0x19d 
Jun 24 15:22:02 mercury kernel: [<c0107136>] kernel_thread [kernel] 0x26 
Jun 24 15:22:02 mercury kernel: [<d891eb30>] lockd [lockd] 0x0 
Jun 24 15:22:02 mercury kernel: 
Jun 24 15:22:02 mercury kernel: 
Jun 24 15:22:02 mercury kernel: Code: c7 04 81 00 00 00 00 8b 4c 24 0c 83 44 24 0c 04 8b 02 0f c8 
The output from lsmod is:
Module                  Size  Used by    Not tainted
via82cxxx_audio        20480   0 (autoclean)
uart401                 7936   0 (autoclean) [via82cxxx_audio]
ac97_codec             11936   0 (autoclean) [via82cxxx_audio]
sound                  71916   0 (autoclean) [via82cxxx_audio uart401]
soundcore               6692   4 (autoclean) [via82cxxx_audio sound]
binfmt_misc             7556   1
autofs                 12132   1 (autoclean)
nfs                    86172   3 (autoclean)
nfsd                   76192   8 (autoclean)
lockd                  56768   1 (autoclean) [nfs nfsd]
sunrpc                 75764   1 (autoclean) [nfs nfsd lockd]
dmfe                   15420   1
ide-cd                 30272   0 (autoclean)
cdrom                  32224   0 (autoclean) [ide-cd]
st                     29140   0 (unused)
usb-uhci               24452   0 (unused)
usbcore                73216   1 [usb-uhci]
qlogicisp              44512   0 (unused)
sd_mod                 12864   0 (unused)
scsi_mod              108576   3 [st qlogicisp sd_mod]
A few weeks ago there was a similar report on linux-kernel which used xfs as
the filesystem.  We are not using xfs - just ext2.
The oops appears to crash lockd (it doesn't appear in the process table
after the oops has occured) and all nfsd processes are locked in the D state
indefinitely.  The machine needs to be rebooted to restore operation.  It
goes without saying that in this state, all clients hang up until the server
has been rebooted.
If you have any further questions I'm happy to attempt to answer them.  This
was going to be our new production server, but until this problem is
rectified we're going to have to fall back to our original DEC - we can't
afford to have 100 people unable to work for days on end.
Please CC me any replies since I do not subscribe to either list (nfs or
linux-kernel).
Best regards
  jonathan
-- * Jonathan Woithe jwoithe@physics.adelaide.edu.au * * http://www.physics.adelaide.edu.au/~jwoithe * ***-----------------------------------------------------------------------*** ** "Time is an illusion; lunchtime doubly so" ** * "...you wouldn't recognize a subtle plan if it painted itself purple and * * danced naked on a harpsichord singing 'subtle plans are here again'" * - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/