Re: Race in open(O_CREAT|O_EXCL) and network filesystem

Andreas Dilger (adilger@clusterfs.com)
Mon, 29 Jul 2002 09:02:11 -0600


On Jul 29, 2002 21:50 +1000, Neil Brown wrote:
> On Sunday July 28, bulb@ucw.cz wrote:
> > maybe I'm blind, but I think there is a race featuring
> > open(O_CREAT|O_EXCL) and nfs or any other network fs.
> >
> > What may happen is:
> >
> > client A: open_namei looks up the inode
> > driver queries server and gets ENOENT
> > client B: open_namei looks up the inode
> > driver queries server and gets ENOENT
> > client A: open_namei calls create method
> > driver requests file to be created and is successful
> > client B: open_namei calls create method
> > dirver requests file to be created and since it does not know,
> > cant specify exclusion, thus is succesful
> > client A: open_namei does no more checks and thus open succeeds
> > client B: open succeeds too here - and it shouldn't
> >
> > Since many applications rely on this working correctly (especialy
> > mailboxes are locked using exclusive creates and mounting them over NFS
> > is quite common).
> >
> > So, can someone please answer:
> >
> > 1) Is there some reason this can't happen that I overlooked?
>
> No. You are correct.
>
> > 2) If it is a problem (comment in NFS suggest so), I can see two ways of
> > handling this. Either pass the flags to the create method, or restart
> > the open when create returns EEXISTS. Which one would be prefered?
>
> Well.. at OLS in the Lustre talk Peter Braam talked about something
> that could be used. Unfortunately it doesn't seem to be included in
> the paper in the proceedings but the idea was to include some "intend"
> in the lookup request. e.g. "lookup with intent to create" or
> "lookup with intent to delete" or maybe "lookup with intent to open
> for exclusive write access". The filesystem could then, at it's
> option, carry out the intended operation (possibly only partially) as
> part of the lookup. A simple filesystem wouldn't bothe and the VFS
> would continue with the normal process. A networked filesystem could
> do that whole operation including intent atomically.

The intent-based lookup code is available as part of the Lustre CVS.
See lustre/patches/patch-2.4.18 at the SF lustre project. There are
a couple of other changes in the patch that are unrelated to intents,
but those are fairly obvious (i.e. ext3/jbd changes, some exports, etc).

Cheers, Andreas

--
Andreas Dilger
http://www-mddsp.enel.ucalgary.ca/People/adilger/
http://sourceforge.net/projects/ext2resize/

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/