Re: [Ext2-devel] Re: Shrinking ext3 directories

Andreas Dilger (adilger@clusterfs.com)
Thu, 20 Jun 2002 04:18:12 -0600


On Jun 20, 2002 10:34 +0100, Stephen C. Tweedie wrote:
> On Thu, Jun 20, 2002 at 12:49:57AM +0200, Daniel Phillips wrote:
> > I don't have code, but let me remind you of this post:
> >
> > http://marc.theaimsgroup.com/?l=ext2-devel&m=102132142032096&w=2
> >
> > A sketch of the coalescing design is at the end. I'll formalize that.
>
> One question --- just how stable will this be if we boot into a kernel
> that doesn't have the coalescing enabled, and start modifying
> directories? We _could_ just teach the current code to clear those
> top 8 bits in the parent any time we touch a leaf node, but that's
> unnecessarily expensive, so we'd really need to have some way of
> either recreating the hint fields from scratch every so often, or of
> spotting when they have become badly out-of-date.

Three notes:
1) Coalescing isn't necessarily the same as just discarding empty
blocks. We can do the latter much more easily, and without the
hint bits at all, but it won't work unless a block is totally
empty, so you could still approach 0% fullness with huge directories.

2) The hint bits are meant to be intentionally vague (i.e. only a hint)
so there is no need to keep them 100% up-to-date. If it turns out
that you modify a directory with a kernel that does not understand
coalescing it is fairly benign. The worst that would happen is that
you get an empty leaf block (assuming you don't even have the simple
support for dropping empty blocks), or you try to coalesce with such
a block and find it too full to do the merge, so you update the hint
again with the correct value. Over the normal course of operations
you would be updating the hints for each block anyways, so invalid
hints would be cleaned out from the index.

To avoid extra overhead from writing out the parent each time you delete
an entry from the leaf, you could update the values all of the time
(you had to have read the parent to find the correct leaf block), but
not mark the block dirty, so the updated hints are only written to disk
if there is another reason to write out the block (e.g. split/coalesce
of a leaf block).

Having a large number of bits of hint info would not necessarily be
useful. In Daniel's "1-bit hint" example, the actual worst-case fullness
could _approach_ 25%, but you would always drop 100% empty blocks
immediately, so it would never quite get there.

With 2 bits of hint, you would probably only merge if the sum of the two
neighbours was <= 3 (i.e. 75% fullness for a single block), because you
don't necessarily want to be merging blocks to be almost 100% full and
then splitting them again. This would give a worst-case fullness between
37.5% and 75% at any time, which isn't really so bad given the performance
implications of repeated merge+truncate+allocate+split operations.

Remember also that each leaf block merge will incur a copy from the tail
block (which may need to be read from disk) and then a truncate to drop
that block. We _could_ leave some number of empty dir blocks at the end
of the directory file if we had some sort of dir prealloc scheme happening.
There would be some amount of hysteresis there to avoid the repeated
alloc/free overhead (i.e. keep no more than 8 free blocks, but allocate
8 blocks at a time if you need more).

Cheers, Andreas

--
Andreas Dilger
http://www-mddsp.enel.ucalgary.ca/People/adilger/
http://sourceforge.net/projects/ext2resize/

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/