Re: ext3-2.4-0.9.4

Jan Harkes (jaharkes@cs.cmu.edu)
Fri, 3 Aug 2001 13:24:57 -0400


On Fri, Aug 03, 2001 at 06:54:12PM +0200, Daniel Phillips wrote:
> On Friday 03 August 2001 18:16, Jan Harkes wrote:
> > On Fri, Aug 03, 2001 at 05:47:17PM +0200, Daniel Phillips wrote:
> > > On Friday 03 August 2001 17:18, Jan Harkes wrote:
> > > > Working on a distributed filesystem with somewhat weaker than UNIX
> > > > semantics might have skewed my vision. In Coda not every client will be
> > > > able to figure out which are all of the possibly paths that can lead to
> > > > a file object. And although we currently try our best to block
> > > > hardlinked directories they could possibly exist, making the problems
> > > > even worse.
> > >
> > > We don't need all the paths, and not any specific path, just a path.
> >
> > Even if that path leads to a name that got removed, thereby forcing the
> > object into lost+found? I thought the MTA did something like,
>
> We'd better get confirmation from the MTA expert in the thread.
>
> > fd = open(tmp/file)
> > write(fd)
> > fsync(fd)
> > link(tmp/file, new/file)
> > fsync(fd) *1
> > unlink(tmp/file)
> >
> > *1 If this fsync only syncs the path leading to tmp/file, and the unlink
> > tmp/file is written back to disk, which is likely because we're only
> > creating/syncing stuff in tmp. Now, until new/file is written there is
> > no path information leading to the file anymore which makes this as
> > 'safe' as not syncing path name information at all.
>
> Nice clear example! Yes, in essence we would have synced the original
> path twice. If this is what the MTA is really doing I'm willing to join
> the "MTA is broken" camp.

Here is the relevant mail,

On Mon, Jul 30, 2001 at 01:11:32PM -0400, Lawrence Greenfield wrote:
} BSD softupdates allows you to call fsync() on the file, and this will
} sync the directories all the way up to the root if necessary.
}
} Thus BSD fsync() actually guarantees that when it returns, the file
} (and all of it's filenames) will survive a reboot.
}
} Sendmail does:
} fd = open(tmp)
} write(fd)
} fsync(fd)
} rename(tmp, final)
} fsync(fd)
}
} Cyrus IMAP does:
} fd = open(tmp)
} write(fd)
} fsync(fd)
} link(tmp, final1)
} link(tmp, final2)
} link(tmp, final3)
} fsync(fd)
} close(fd)
} unlink(tmp)
}
} The idea that Linux fsync() doesn't actually make the file survive
} reboots is pretty ridiculous.

As you can see, the 'sync a path leading to the file' semantics from SuS
don't work in the Cyrus IMAP case as is specifically requires to have
_all_ paths committed to disk before fsync returns.

On Fri, Aug 03, 2001 at 06:54:12PM +0200, Daniel Phillips wrote:
> On Friday 03 August 2001 18:16, Jan Harkes wrote:
> > Now if the application would use the directory sync, it can actually
> > tell the kernel that that new/file name is the interesting one to keep
> > and that tmp doesn't even need to be written to disk at all.
>
> Yep. Or if it used rename, which updates the dcache entries, instead
> of link/unlink.

MTA's that want to do reliable deliveries using multiple processes (or
on a networked filesystem) tend to not use rename because it implicitly
unlinks the target if it already exists and this could lead to loss of
mail that was already considered as being delivered.

Jan

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/