Re: [Bug 417] New: htree much slower than regular ext3

Daniel Phillips (phillips@arcor.de)
Mon, 10 Mar 2003 18:58:16 +0100


On Sun 09 Mar 03 08:08, Alex Tomas wrote:
> >>>>> Daniel Phillips (DP) writes:
>
> DP> On Fri 07 Mar 03 16:46, Alex Tomas wrote:
> DP> The problem I see with your approach is that the traversal is no
> DP> longer in hash order, so a leaf split in the middle of a
> DP> directory traversal could result in a lot of duplicate dirents.
> DP> I'm not sure there's a way around that.
>
> 1) As far as I understand, duplicates are possible even in classic ext2
> w/o sortdir/index. See the diagram:
>
> Process 1 Process 2
>
> getdents(2) returns
> dentry1 (file1 -> Inode1)
> dentry2 (file2 -> Inode2)
>
> context switch -->
> unlink(file1), empty dentry1
> creat(file3), Inode3, use
> dentry1 creat(file1), Inode1, use dentry3
>
> context switch -->
>
> getdents(2) returns
> dentry3(file1 -> Inode1)
>
>
> Am I right?
>
>
> 2) Why do not use hash order for traversal like ext3_dx_readdir() does?
> Upon reading several dentries within some hash set readdir() sorts them
> in inode order and returns to an user.
>
>
> with best regards, Alex

You're right, but you still don't win the stuffed poodle.

I put forth the same argument at last year's kernel workshop and was
overruled on the grounds that, let me see:

- Duplicates as a result of unlinks are one thing, duplicates as a
result of creates are another, worse thing.

- Creating one duplicate as a result of one unlink is one thing,
creating lots of duplicates with one operation is another, worse
thing.

- The rules are written to allow this particular duplicate behavior
in UFS (design inherited by Ext2/3) because at the time, UFS was
the only game in town.

The phrase "broken by design" comes to mind. Ted and Stephen would no doubt
be happy to elaborate. Anyway, there's another reason we need to return
results in hash order, and that is telldir/seekdir, which expects a stable
enumeration of a directory with no duplicates, even with concurrent
operations going on, and even if the server is rebooted in the middle of a
directory traversal. Not just broken by design, but smashed to pieces by
design. And we still have to make it work, for some definition of "work".

Anyway,
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/