Re: [RFC] using writepage to start io

Daniel Phillips (phillips@bonn-fries.net)
Mon, 6 Aug 2001 07:39:47 +0200


On Monday 06 August 2001 01:32, Chris Mason wrote:
> On Monday, August 06, 2001 12:38:01 AM +0200 Daniel Phillips
>
> <phillips@bonn-fries.net> wrote:
> > On Sunday 05 August 2001 20:34, Chris Mason wrote:
> >> I wrote:
> >> > Note that the fact that buffers dirtied by ->writepage are
> >> > ordered by time-dirtied means that the dirty_buffers list really
> >> > does have indirect knowledge of page aging. There may well be
> >> > benefits to your approach but I doubt this is one of them.
> >>
> >> A problem is that under memory pressure, we'll flush a buffer that
> >> has been dirty for a long time, even if we are constantly
> >> redirtying it and have it more or less pinned. This might not be
> >> common enough to cause problems, but it still isn't optimal. Yes,
> >> it is a good idea to flush that page at some time, but under memory
> >> pressure we want to do the least amount of work that will lead to a
> >> freeable page.
> >
> > But we don't have a choice. The user has set an explicit limit on
> > how long a dirty buffer can hang around before being flushed. The
> > old-buffer rule trumps the need to allocate new memory. As you
> > noted, it doesn't cost a lot because if the system is that heavily
> > loaded then the rate of dirty buffer production is naturally
> > throttled.
>
> there are at least 3 reasons to write buffers to disk
>
> 1) they are too old
> 2) the percentage of dirty buffers is too high
> 3) you need to reclaim them due to memory pressure
>
> There are 3 completely different things; there's no trumping of
> priorities.

There is. If your heavily loaded machine goes down and you lose edits
from 1/2 an hour ago even though your bdflush parms specify a 30 second
update cycle you'll call the system broken, whereas if it runs 5% slower
under heavy write+swap load that's just life.

> Under memory pressure you write buffers you have a high
> chance of freeing, during write throttling you write buffers that
> won't get dirty again right away, and when writing old buffers you
> write the oldest first.
>
> This doesn't mean you can always make the right decision on all 3
> cases, or that making the right decision is worth the effort ;-)

If we need to do write throttling we should do it at the point where we
still know its a write, i.e., somewhere in sys_write. Some time after
writes are throttled (specified by bdflush parms) all the old write
buffers will have worked their way through to the drives and your case
(3) gets all the bandwidth. I don't see a conflict, except that we
don't have such an upstream write throttling mechanism yet. We sort-of
have one in that a writer will busy itself trying to help out with lru
scanning when it can't get a free page for the page cache. This has the
ugly result that we have bunches of processes spinning on the lru lock
and we have no idea what the queue scanning rates really are. We can do
something much more intelligent and predictable there and we'll be a lot
closer to being able to balance intelligently between your cases.

By the way, I think you should combine (2) and (3) using an and, which
gets us back to the "kupdate thing" vs the "bdflush thing".

--
Daniel
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/