Re: 2.4.19+trond and diskless locking problems

Christian Reis (kiko@async.com.br)
Wed, 4 Dec 2002 12:20:15 -0200


Hey there. Another update on the wierd hangs. After rebooting the server
(and cleaning up /var/lib/nfs/sm consequently) we've had a couple of
days with no hangs. I've been keeping an eye on that directory closely
the past days to see if we have leftover entries there, and today one
showed up:

anthem:~$ date
Wed Dec 4 11:59:31 BRDT 2002
anthem:~$ sudo ls -l /var/lib/nfs/sm
total 0
-rw------- 1 root root 0 Dec 4 08:47 192.168.99.7

So I have a file there that has been untouched for over 3 hours now.
Having a look at the logfile, I have:

[... bootup messages culminating in ntpd startup ]

Dec 4 08:44:16 canario ntpd[317]: kernel time discipline status 0040
Dec 4 08:44:16 canario ntpd[317]: frequency initialized -97.153
from /var/lib/ntp/driftfile.canario
Dec 4 08:47:22 canario rpc.statd[101]: Received erroneous SM_UNMON
request from canario for 192.168.99.4

[ same time as file above, see ]

Dec 4 08:48:47 canario ntpd[317]: kernel time discipline status change 41
Dec 4 08:50:01 canario /USR/SBIN/CRON[425]: (smmsp) CMD (test -x
/usr/share/sendmail/sendmail && /usr/share/sendmail/sendmail cron-msp)

[... more bootup messages]

The reboot log from the night before shows a series of 8 erroneous
SM_UNMON messenges for a period of about an hour before the actual
reboot. The machine was shut down unclean the time before that (well,
as unclean as an NFS-mounted box can be).

And, sure enough, when shutting down this box now, we get the
stalls/pauses again. nlm_debug outputs for both the client and the
server (neat 30s intervals show up :-) - I've selected specific seconds
of activity (i.e. there was a set of entries at 12:06:32 before):

CLIENT:

Dec 4 12:07:02 canario kernel: lockd: nlm_bind_host(c0a86304)
Dec 4 12:07:02 canario kernel: lockd: next rebind in 6000 jiffies
Dec 4 12:07:02 canario kernel: lockd: get host 192.168.99.4
Dec 4 12:07:02 canario kernel: lockd: release host 192.168.99.4
Dec 4 12:07:02 canario kernel: lockd: release host 192.168.99.4
Dec 4 12:07:02 canario kernel: lockd: nlm_lookup_host(c0a86304, p=17, v=4)
Dec 4 12:07:02 canario kernel: lockd: host garbage collection
Dec 4 12:07:02 canario kernel: lockd: nlmsvc_mark_resources
Dec 4 12:07:02 canario kernel: lockd: get host 192.168.99.4
Dec 4 12:07:02 canario kernel: lockd: nlm_bind_host(c0a86304)
Dec 4 12:07:02 canario kernel: lockd: release host 192.168.99.4
Dec 4 12:07:02 canario kernel: lockd: nlm_lookup_host(c0a86304, p=17, v=4)
Dec 4 12:07:02 canario kernel: lockd: get host 192.168.99.4
Dec 4 12:07:02 canario kernel: lockd: nlm_bind_host(c0a86304)
Dec 4 12:07:32 canario kernel: lockd: nlm_bind_host(c0a86304)
Dec 4 12:07:32 canario kernel: lockd: get host 192.168.99.4
Dec 4 12:07:32 canario kernel: lockd: release host 192.168.99.4
Dec 4 12:07:32 canario kernel: lockd: release host 192.168.99.4
Dec 4 12:07:32 canario kernel: lockd: nlm_lookup_host(c0a86304, p=17, v=4)
Dec 4 12:07:32 canario kernel: lockd: get host 192.168.99.4
Dec 4 12:07:32 canario kernel: lockd: nlm_bind_host(c0a86304)
Dec 4 12:07:32 canario kernel: lockd: release host 192.168.99.4
Dec 4 12:07:32 canario kernel: lockd: nlm_lookup_host(c0a86304, p=17, v=4)
Dec 4 12:07:32 canario kernel: lockd: get host 192.168.99.4
Dec 4 12:07:32 canario kernel: lockd: nlm_bind_host(c0a86304)

SERVER:

Dec 4 12:07:02 anthem kernel: lockd: request from c0a86307
Dec 4 12:07:02 anthem kernel: lockd: nlm_lookup_host(c0a86307, p=17, v=4)
Dec 4 12:07:02 anthem kernel: lockd: get host 192.168.99.7
Dec 4 12:07:02 anthem kernel: lockd: nlm_file_lookup(06000001 0b000900 00000002 0000fd3a 0000fd16 000108a8)
Dec 4 12:07:02 anthem kernel: lockd: found file c1d1f380 (count 0)
Dec 4 12:07:02 anthem kernel: lockd: nlmsvc_cancel(090b/64826, pi=1, 0-9223372036854775807)
Dec 4 12:07:02 anthem kernel: lockd: nlmsvc_lookup_block f=c1d1f380 pd=1 0-9223372036854775807 ty=1
Dec 4 12:07:02 anthem kernel: lockd: check f=c1d1f380 pd=1 0-9223372036854775807 ty=1 cookie=1393
Dec 4 12:07:02 anthem kernel: lockd: deleting block d12e9000...
Dec 4 12:07:02 anthem kernel: lockd: unblocking blocked lock
Dec 4 12:07:02 anthem kernel: lockd: release host 192.168.99.7
Dec 4 12:07:02 anthem kernel: lockd: release host 192.168.99.7
Dec 4 12:07:02 anthem kernel: lockd: nlm_release_file(c1d1f380, ct = 1)
Dec 4 12:07:02 anthem kernel: nlmsvc_retry_blocked(00000000, when=0)
Dec 4 12:07:02 anthem kernel: nlmsvc_retry_blocked(00000000, when=0)
Dec 4 12:07:02 anthem kernel: lockd: request from c0a86307
Dec 4 12:07:02 anthem kernel: lockd: nlm_lookup_host(c0a86307, p=17, v=4)
Dec 4 12:07:02 anthem kernel: lockd: get host 192.168.99.7
Dec 4 12:07:02 anthem kernel: lockd: nlm_file_lookup(06000001 0b000900 00000002 0000fd3a 0000fd16 000108a8)
Dec 4 12:07:02 anthem kernel: lockd: found file c1d1f380 (count 0)
Dec 4 12:07:02 anthem kernel: lockd: nlmsvc_unlock(090b/64826, pi=1, 0-9223372036854775807)
Dec 4 12:07:02 anthem kernel: lockd: nlmsvc_cancel(090b/64826, pi=1, 0-9223372036854775807)
Dec 4 12:07:02 anthem kernel: lockd: nlmsvc_lookup_block f=c1d1f380 pd=1 0-9223372036854775807 ty=2
Dec 4 12:07:02 anthem kernel: lockd: release host 192.168.99.7
Dec 4 12:07:02 anthem kernel: lockd: nlm_release_file(c1d1f380, ct = 1)
Dec 4 12:07:02 anthem kernel: nlmsvc_retry_blocked(00000000, when=0)
Dec 4 12:07:02 anthem kernel: nlmsvc_retry_blocked(00000000, when=0)
Dec 4 12:07:02 anthem kernel: lockd: request from c0a86307
Dec 4 12:07:02 anthem kernel: lockd: nlm_lookup_host(c0a86307, p=17, v=4)
Dec 4 12:07:02 anthem kernel: lockd: get host 192.168.99.7
Dec 4 12:07:02 anthem kernel: lockd: nlm_file_lookup(06000001 0b000900 00000002 0000fd3a 0000fd16 000108a8)
Dec 4 12:07:02 anthem kernel: lockd: found file c1d1f380 (count 0)
Dec 4 12:07:02 anthem kernel: lockd: nlmsvc_lock(090b/64826, ty=1, pi=1, 0-9223372036854775807, bl=1)
Dec 4 12:07:02 anthem kernel: lockd: nlmsvc_lookup_block f=c1d1f380 pd=1 0-9223372036854775807 ty=1
Dec 4 12:07:02 anthem kernel: lockd: blocking on this lock (allocating).
Dec 4 12:07:02 anthem kernel: lockd: nlm_lookup_host(c0a86307, p=17, v=4)
Dec 4 12:07:02 anthem kernel: lockd: get host 192.168.99.7
Dec 4 12:07:02 anthem kernel: lockd: created block d12e9000...
Dec 4 12:07:02 anthem kernel: lockd: nlmsvc_insert_block(d12e9000, -1)
Dec 4 12:07:02 anthem kernel: lockd: blocking on this lock.
Dec 4 12:07:02 anthem kernel: lockd: release host 192.168.99.7
Dec 4 12:07:02 anthem kernel: lockd: nlm_release_file(c1d1f380, ct = 1)
Dec 4 12:07:02 anthem kernel: nlmsvc_retry_blocked(d12e9000, when=-1)
Dec 4 12:07:02 anthem kernel: nlmsvc_retry_blocked(d12e9000, when=-1)
Dec 4 12:07:32 anthem kernel: lockd: request from c0a86307
Dec 4 12:07:32 anthem kernel: lockd: nlm_lookup_host(c0a86307, p=17, v=4)
Dec 4 12:07:32 anthem kernel: lockd: host garbage collection
Dec 4 12:07:32 anthem kernel: lockd: nlmsvc_mark_resources
Dec 4 12:07:32 anthem kernel: lockd: get host 192.168.99.7
Dec 4 12:07:32 anthem kernel: lockd: nlm_file_lookup(06000001 0b000900 00000002 0000fd3a 0000fd16 000108a8)
Dec 4 12:07:32 anthem kernel: lockd: found file c1d1f380 (count 0)
Dec 4 12:07:32 anthem kernel: lockd: nlmsvc_cancel(090b/64826, pi=1, 0-9223372036854775807)
Dec 4 12:07:32 anthem kernel: lockd: nlmsvc_lookup_block f=c1d1f380 pd=1 0-9223372036854775807 ty=1
Dec 4 12:07:32 anthem kernel: lockd: check f=c1d1f380 pd=1 0-9223372036854775807 ty=1 cookie=1396
Dec 4 12:07:32 anthem kernel: lockd: deleting block d12e9000...
Dec 4 12:07:32 anthem kernel: lockd: unblocking blocked lock
Dec 4 12:07:32 anthem kernel: lockd: release host 192.168.99.7
Dec 4 12:07:32 anthem kernel: lockd: release host 192.168.99.7
Dec 4 12:07:32 anthem kernel: lockd: nlm_release_file(c1d1f380, ct = 1)
Dec 4 12:07:32 anthem kernel: nlmsvc_retry_blocked(00000000, when=0)
Dec 4 12:07:32 anthem kernel: nlmsvc_retry_blocked(00000000, when=0)
Dec 4 12:07:32 anthem kernel: lockd: request from c0a86307
Dec 4 12:07:32 anthem kernel: lockd: nlm_lookup_host(c0a86307, p=17, v=4)
Dec 4 12:07:32 anthem kernel: lockd: get host 192.168.99.7
Dec 4 12:07:32 anthem kernel: lockd: nlm_file_lookup(06000001 0b000900 00000002 0000fd3a 0000fd16 000108a8)
Dec 4 12:07:32 anthem kernel: lockd: found file c1d1f380 (count 0)
Dec 4 12:07:32 anthem kernel: lockd: nlmsvc_unlock(090b/64826, pi=1, 0-9223372036854775807)
Dec 4 12:07:32 anthem kernel: lockd: nlmsvc_cancel(090b/64826, pi=1, 0-9223372036854775807)
Dec 4 12:07:32 anthem kernel: lockd: nlmsvc_lookup_block f=c1d1f380 pd=1 0-9223372036854775 807 ty=2
Dec 4 12:07:32 anthem kernel: lockd: release host 192.168.99.7
Dec 4 12:07:32 anthem kernel: lockd: nlm_release_file(c1d1f380, ct = 1)
Dec 4 12:07:32 anthem kernel: nlmsvc_retry_blocked(00000000, when=0)
Dec 4 12:07:32 anthem kernel: nlmsvc_retry_blocked(00000000, when=0)
Dec 4 12:07:32 anthem kernel: lockd: request from c0a86307
Dec 4 12:07:32 anthem kernel: lockd: nlm_lookup_host(c0a86307, p=17, v=4)
Dec 4 12:07:32 anthem kernel: lockd: get host 192.168.99.7
Dec 4 12:07:32 anthem kernel: lockd: nlm_file_lookup(06000001 0b000900 00000002 0000fd3a 0000fd16 000108a8)
Dec 4 12:07:32 anthem kernel: lockd: found file c1d1f380 (count 0)
Dec 4 12:07:32 anthem kernel: lockd: nlmsvc_lock(090b/64826, ty=1, pi=1, 0-9223372036854775 807, bl=1)
Dec 4 12:07:32 anthem kernel: lockd: nlmsvc_lookup_block f=c1d1f380 pd=1 0-9223372036854775 807 ty=1
Dec 4 12:07:32 anthem kernel: lockd: blocking on this lock (allocating).
Dec 4 12:07:32 anthem kernel: lockd: nlm_lookup_host(c0a86307, p=17, v=4)
Dec 4 12:07:32 anthem kernel: lockd: get host 192.168.99.7
Dec 4 12:07:32 anthem kernel: lockd: created block d12e9000...
Dec 4 12:07:32 anthem kernel: lockd: nlmsvc_insert_block(d12e9000, -1)
Dec 4 12:07:32 anthem kernel: lockd: blocking on this lock.
Dec 4 12:07:32 anthem kernel: lockd: release host 192.168.99.7
Dec 4 12:07:32 anthem kernel: lockd: nlm_release_file(c1d1f380, ct = 1)
Dec 4 12:07:32 anthem kernel: nlmsvc_retry_blocked(d12e9000, when=-1)
Dec 4 12:07:32 anthem kernel: nlmsvc_retry_blocked(d12e9000, when=-1)

HTH. Yes, we're all sick of this subject :-/

Take care,

--
Christian Reis, Senior Engineer, Async Open Source, Brazil.
http://async.com.br/~kiko/ | [+55 16] 261 2331 | NMFL
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/