Re: nfs MAP_SHARED corruption fix

Trond Myklebust (trond.myklebust@fys.uio.no)
Thu, 10 May 2001 12:11:35 +0200


>>>>> " " == Marcelo Tosatti <marcelo@conectiva.com.br> writes:

> I suggested the removal of I_DIRTY_PAGES check because the
> current behaviour of munmap seems to be synchronous (1), so I
> guess you _always_ want it to be synchronous.

Yes. The NFS concept of close-to-open cache consistency requires you
to flush out all writes upon file close(). In this case, the usual
nfs_wb_file() + error reporting in nfs_release() won't catch anything
that's been put out using writepage(), because the latter can't mark
the requests using the struct file. This means we have to do something
special for mmap...

>> Another thing (completly unrelated to the above issues) that I
>> noticed while looking over this nfs code is that the
>> __sync_one() for example called by generic_file_write(O_SYNC)
>> will recall fdatasync but no nfs_wb_all is put before the
>> fdatawait, and I'm not sure that the nfs_sync_page called by
>> the fdatawait is enough to rapidly flush the writepaged stuff
>> to the nfs server. nfs_sync_page apparently only cares about
>> speculative reads, not at all about committing writebacks. It
>> would look much saner to me if nfs_sync_page also does a
>> nfs_wb_all() on the inode, so that the ->sync_page callback
>> gets the same semantics it has for the real filesystems.

> Looks sane and will probably makes things faster.

This should normally not be needed. Firstly there is logic in the
write code to send off a request as soon as we have scheduled wsize
bytes worth of data. Secondly there is the 'flushd' daemon that exists
in order to time out requests, and to flush them out if the first
logic fails to do so.

That said, the __sync_one() thing is of interest to me. You'll notice
that our lack of a write_inode() means that we don't currently support
the sync() system call. Furthermore, we do O_SYNC through the slower
method of actually making our commit_write() method make a synchronous
call.
I have a patch that implements write_inode() and removes our current
O_SYNC code on

http://www.fys.uio.no/~trondmy/src/linux-2.4.4-write_inode.dif

It's been running for a month or two on my systems, but I've been wary
of sending it to Linus as it's not, strictly speaking, a bugfix.

Cheers,
Trond
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/