The problem is this: my integrity-preserving algorithm requires that dirty
blocks be recorded to disk in a certain order. However bdflush doesn't know
anything about this nicety and is happy to force any random dirty block to
disk at any time. Sync_buffers is also unsympathetic and will blindly force
all dirty blocks to disk in the event of a panic. This is exactly what I
don't want in a panic: I've already carefully arranged for the filesystem image
to be consistent at the point of any interruption; writing out more dirty
buffers won't improve things at all and could cause damage.
Ideally, what I'd like to do is to schedule all transfers to disk myself and not
let bdflush or sync_buffers get involved, at least not to the extent they do
now. The big problem with this is that I'd either have to take over the buffer
system completely or go outside my filesystem and start making patches to vfs,
neither of which is appealling.
As a workaround I was able to patch up my algorithm to be immune from the
undesireable effcts of random block flushes. The cost is some extra
complexity and slight degradation of performance - not too horribly bad if it
gets me up and running, and it buys some time to assess just what could be
done to the buffer system to make it more friendly to this type of filesystem.
In the end I think that the buffer system ought to be virtualized in the same
way as the rest of vfs, with some as-yet unidentified set of operations that can
be taken as defaults or overridden as required. This would allow the buffer
behaviour to be tuned on a per-filesystem/partition basis, which would be
right and proper.
I'd appreciate it if anyone has suggestions on what else I can do to tame the
buffer system, working within the existing system. I'd also appreciate
any pointers to documentation on the buffer system.
-- Daniel- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/