RE: O_SYNC option doesn't work (2.4.18-3)

Gregory Giguashvili (Gregoryg@ParadigmGeo.com)
Thu, 8 Aug 2002 16:57:59 +0200

Messages sorted by: [ date ][ thread ][ subject ][ author ]
Next message: David S. Miller: "Re: forcing the tg3 into full duplex"
Previous message: Ingo Molnar: "Re: bad: schedule() with irqs disabled! (+ ksymoops)"

Dear All,

I wanted to thank you All for your patience and help. As I have triggered
this discussion, I would like to summarize it by listing my findings, so it
can help others having same/similar questions.

I want to mention that I was looking for a way to make the best out of NFS
with regard to reading/writing files while not breaking the consistency of
data. I wasn't implying to get rid of locking as some of you suggested (I
fully understand the stateless nature of NFS protocol), instead I was
looking for a mix (both application and OS options) that will make Linux
systems behave similarly to other UN*X colleagues. In my opinion, any
standard behavior would greatly affect the strategic decision of large
companies to move to Linux with all the benefits from this to all of us. I'm
sorry if I wasn't clear enough.

Most of the current UN*X systems provide you with the mentioned
functionality by using the strictest NFS mount options as default (sync,
noac, etc.). Linux, on the contrary, strives to achieve maximum performance
on the expense of data consistency, which is not the most expected behavior
(BTW, the performance is great anyway!).

OS options:
===========
- The directories should be exported with "sync" option, which is not
default.
- The directories should be NFS mounted using "sync" and "noac" options,
which are not default.

Application code (per read/write transaction):
==============================================
- lock_file (doesn't have to be a lockd lock)
- open_file (O_SYNC)
- read/write file
- close_file
- unlock_file

The above sequence works OK, but it's still problematic since it doesn't
ensure that the data is coming from the server instead of NFS cache. In one
of my previous discussions on this list, I was told to use the following
technique to flush NFS buffers. It turns out that acquiring write lock using
fcntl (F_SETLK) interface acts as NFS write barrier and flushes all the file
NFS buffers. So, the above sequence would result in the following one:

- lock_file (doesn't have to be a lockd lock)
- open_file (O_SYNC)
- if (read) fcntl (F_SETLK) (doing lock/unlock here to flush NFS buffers)
- read/write file
- if (write) fcntl (F_SETLK) (doing lock/unlock here to flush NFS buffers)
- close_file
- unlock_file

FCNTL Unexpected behavior (Possible BUG?):
==========================================
There is a catch when using fcntl system call. It appears not to flush NFS
buffers when _FILE_OFFSET_BITS=64 is defined. This is because F_SETLK is
redefined for 64 bit version to be equal to 13, which makes fcntl call
behave differently. Passing hard-coded 6 to the call would produce the
expected results (one can see all this in strace).

Summary:
========
Using the fcntl dirty trick I finally achieved the expected behavior. BTW, I
don't understand why open/close calls do not flush NFS buffers - this seems
like a natural thing to do.
Anyway, I'm looking forward to your comments on this. I might have arrived
to some wrong conclusions, which I hope you'll help me to correct.

Best regards,
Giga

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: David S. Miller: "Re: forcing the tg3 into full duplex"
Previous message: Ingo Molnar: "Re: bad: schedule() with irqs disabled! (+ ksymoops)"