The main motivation for that code was to fix the computational cost
of the buffer layer by doing a complete end-around. That problem
was later solved by fixing the buffer layer.
Yes, there are still reasons for delayed allocation, the space reservation
API, etc. But they are not compelling. They are certainly not compelling
when writeback continues to use the (randomly-ordered) mapping->dirty_pages
walk.
With radix-tree enhancements to permit a pgoff_t-order walk of the
dirty pages then yeah, order-of-magnitude gains in the tiobench
random write pass.
> > Also it may be used in the NFS server for storing credential
> > information.
>
> The NFS server is still a deep, black hole in the kernel from my point of
> view and I'd like that situation to end as soon as possible, so it might
> as well start ending now. Can you provide me a pointer to go start
> digging at that specific question?
NFS client. Me too.
> (And strongly agreed about the invalidate_inode_pages(2) issue: at some
> point it would behoove VM and NFS developers to reach a mutual
> understanding of what that interface is supposed to do, because it is
> growing new warts and tentacles at an alarming rate, and still seems to
> be, at best, a heuristic. I currently have the impression that the
> intent is to make files sort-of coherent between clients, for some
> slippery definition of sort-of.)
I've been discussing that with Chuck. I'd prefer that the NFS client
use a flavour of vmtruncate(), with its strong guarantees. But we
won't know how horrid that is from a locking perspective until Trond
returns.
> ...
> In general, I think we'd be better off if page->buffers was not opaque,
Disagree. There is zero computational cost to the current setup,
and it's a little cleaner, and it permits things such as the removal
of ->private from the pageframes, and hashing for it.
And there is plenty of precedent for putting fs-private hooks into
core VFS data structures.
> and that it should remain non-opaque until we are definitely ready to
> get rid of them.
There is nothing wrong with buffers, except the name. They no longer
buffer anything.
They _used_ to be the buffering entity, and an IO container, and
the abstraction of a disk block.
They are now just the abstraction of a disk block. s/buffer_head/block/g
should make things clearer.
And there is no way in which the kernel can get along without
some structure which represents a disk block. It does one thing,
and it does it well.
The page is the buffering entity.
The buffer_head is a disk block.
The BIO is the IO container.
Sometimes, for efficiency, we bypass the "block" part and go direct
page-into-BIO. That's a conceptually-wrong performance hack.
Yes, one could try to graft the "block" abstraction up into struct
page, or down into struct BIO. But one would be mistaken, I expect.
> Doing otherwise will just allow various applications
> to start growing tendrils into the field, making it that much harder
> to get rid of when the time comes.
>
> So the question is, does anyone *really* need (void *) page->private
> instead of page->buffers?
Don't know. But leaving it as-is tells the world that this is
per-fs metadata which the VM/VFS supports. This has no cost.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/