FS corruption with NFS and SSH

Narendra Sankar (naren@xstreamlogic.com)
Mon, 23 Aug 1999 18:24:26 +0000


hi
Forgive me for sending this to both the kernel and the smp list, but I
do not know which the root cause of this problem.
In short I am having file system corruption problems on my NFS file
server when I have SSH enabled and use SSH to log into the sever to
install software.

In detail (sorry for the long mail)

I am running stock Redhat 6.0 with the updates from the Redhat ftp site.
I haven't recompiled the kernel or anything else. I am running on a Dual
P-II 400 Dell Poweredge 4300 with 256 MB or RAM and a 18 GB LVD SCSI
hard disk. My server exports /usr/local to my client workstations, all
running Redhat 6.0 with the same upgrades. I have SSH 2.0.13 install on
my server and my clients. The clients NFS mount the server with
rsize,wsize=8192.
Since I installed SSH I have seen the following messages in my server
log

parana being the server and indus the client

Aug 16 11:53:46 parana sshd2[12611]: User root, coming from indus,
authenticated.
Aug 16 13:40:34 parana kernel: : no inode in rename or err: -16.
Aug 16 13:40:34 parana kernel: nfsd: non-standard errno: 16
Aug 16 13:40:35 parana kernel: : no inode in rename or err: -16.
Aug 16 13:40:35 parana kernel: nfsd: non-standard errno: 16

And then when I logged into the server and tried to install software
I get the following errors. In this case I logged into the server and
installed Applix. Applix did install correctly and runs fine, but this
message was worrisome.

Aug 16 17:30:35 parana sshd2[13386]: User root, coming from indus
authenticated.
Aug 16 17:34:22 parana kernel: fh_verify: applix/axdata permission
failure, acc=13, error=13
Aug 16 19:17:10 parana kernel: EXT2-fs warning (device sd(8,6)):
ext2_free_blocks: bit already cleared for block 5079464
Aug 16 19:17:10 parana last message repeated 3 times
Aug 16 19:17:10 parana kernel: EXT2-fs warning (device sd(8,6)):
ext2_free_inode: bit already cleared for inode 1269832
Aug 16 19:17:10 parana kernel: EXT2-fs warning (device sd(8,6)):
ext2_free_blocks: bit already cleared for block 5079463
Aug 16 19:17:10 parana kernel: EXT2-fs warning (device sd(8,6)):
ext2_free_inode: bit already cleared for inode 1269831
Aug 16 19:17:10 parana kernel: find_fh_dentry: 08:06/1269832 dir/1269831
not found!
Aug 18 00:26:50 parana kernel: find_fh_dentry: _var/19990818_0.gz.tmp
lookup mismatch!
Aug 18 00:26:50 parana kernel: find_fh_dentry: 08:06/376911 dir/376882
not found!

I then installed a lot of software incl. Staroffice and other big
applications - probably 3-400 MB in all. And then my file server just
locked up and refused to boot. I managed to boot single user and this is
what I found in the logs-

Aug 20 19:23:24 parana nfs: rpc.mountd startup succeeded
Aug 20 19:23:25 parana kernel: find_fh_dentry: 08:06/868435 dir/868423
not found!
Aug 20 19:23:25 parana last message repeated 88 times
Aug 20 19:23:25 parana nfs: rpc.nfsd startup succeeded
Aug 20 19:23:25 parana kernel: find_fh_dentry: 08:06/868435 dir/868423
not found!
Aug 20 19:23:25 parana last message repeated 220 times
Aug 20 19:23:25 parana yppasswdd: rpc.yppasswdd startup succeeded
Aug 20 19:23:25 parana kernel: find_fh_dentry: 08:06/868435 dir/868423
not found!
Aug 20 19:23:25 parana last message repeated 82 times

And this was eventually flooding my memory and locking up my machine
such that I had to recycle the power. This seem to happen as soon as
I exported my NFS mounts, and no matter what I did I could not boot. By
trial and error, I managed to narrow it down to NFS and SSH. I disabled
both and voila, my system came up. However even then the moment
I exported /usr/local, it would die. So I had to delete /usr/local, and
rebuild it from backup to generate a clean file system, before I could
export it. I finally have everything up and running, with SSH disabled
and a brand new file system.

The above was with 2.2.5-22smp from Redhat.

I tried upgrading to 2.2.11, by downloading and recompiling the kernel,
and the error message changed to the following -

Aug 20 20:15:11 parana nfs: rpc.mountd startup succeeded
Aug 20 20:15:11 parana kernel: find_fh_dentry: 08:06, 868423/868435 not
found -- need full search!
Aug 20 20:15:12 parana last message repeated 86 times
Aug 20 20:15:12 parana nfs: rpc.nfsd startup succeeded
Aug 20 20:15:12 parana kernel: find_fh_dentry: 08:06, 868423/868435 not
found -- need full search!
Aug 20 20:15:43 parana last message repeated 33819 times
Aug 20 20:16:06 parana last message repeated 24425 times

I hope this gives someone a clue

I wonder if there is a race condition or something. I do not know if
this in smp related or NFS or SSH? Unfortunately this server is running
our entire company and I cannot afford to experiment on it.
Any help or information would be great. Email me if you need any more
information. I can recreate this scenario pretty easily.

Thanks
Naren

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/