Re: Bug with shared memory.

Andrew Morton (akpm@zip.com.au)
Sun, 19 May 2002 22:21:18 -0700


Andrea Arcangeli wrote:
>
> ...
> As next thing I'll go ahead on the inode/highmem imbalance repored by
> Alexey in the weekend.

(Some background: the problem is that ZONE_NORMAL gets clogged with
inodes which are unfreeable because they have attached pagecache
pages. There is no VM pressure against ZONE_HIGHMEM and there is
nothing which causes the pagecache pages to be freed. The machine
dies due to ZONE_NORMAL exhaustion. Workload is (presumably) the
reading or creation of a very large number of small files on a machine
which has much highmem.

I expect Andrea is looking at a solution which releases pages against
unused inodes when there are many unused inodes, but prune_icache()
is failing to free sufficient inodes).

I'll be interested to see your solution ;) We need to be careful to
not over-shrink pagecache. Also we need to be doubly-careful to
ensure that LRU order is maintained on unused_list.

I expect this is a compound bug: sure, ZONE_NORMAL is clogged with
inodes. But I bet it's also clogged with buffer_heads. If the
buffer_head problem was fixed then the inode problem would occur
much later. - more highmem, more/smaller files.

> Then the only pending thing before next -aa is
> the integration of the 2.5 scheduler like in 2.4-ac. I will also soon
> automate the porting of all the not-yet merged stuff into 2.5, so we
> don't risk to forget anything.

That's good.

> I mostly care about mainline but I would
> also suggest Alan to merge in -ac the very same VM of -aa (so we also
> increase the testing before it gets merged in mainline as it is
> happening bit by bit overtime). Alan, I'm looking at it a bit and what
> you're shipping is an hybrid between the old 2.4 vm and the current one,
> plus using the rmap design for unmapping pagetables. The issue is not
> the rmap design, the issue is that the hybrid gratuitously reintroduces
> various bugs like kswapd infinite loop and it misses all the recent
> fixes (problems like kswapd infinite loop are reproducible after months
> the machine are up, other things like unfreed bh with shortage in the
> normal zone fixed in recent -aa are reproducible only with big irons,
> numa/discontigmem stuff, definitive fix for google etc..etc..).

I've seen a vague report from the IBM team which indicates that -aa VM does
not solve the ZONE_NORMAL-full-of-buffer_heads lockup. The workload was
specweb on a 16 gig machine, I believe.

I did a quickie patch early this month which simply strips buffers
away from pages at each and every opportunity - that fixed it of
course. At the end of the run, ZONE_NORMAL had 250 MB free, as
opposed to zero with 2.4.18-stock.

I think this is a better approach than memclass_related_bhs().
Because it proactively makes memory available in ZONE_NORMAL
for blockdev pages. If we minimise the amount of memory which
is consumed by buffer_heads then we maximise the amount of
memory which is available to inodes and indirects. Sure,
the buffers cache the disk mapping. But the inodes and indirects
cache the same information which *much* better packing density.
Sure, there's some extra CPU cost in reestablishing the mapping,
but in the no-buffer_heads testing I'm doing on 2.5 it is unmeasurable.

Currently, we're effectively allowing buffer_heads to evict useful
pagecache data. We have to perform additional I/O to get that back.

In other words: once we've finished I/O, just get rid of the
damn things. Bit radical for 2.4 maybe.

> ...
>
> About rmap design I would very much appreciate if Rik could make a
> version of his patch that implements rmap on top of current -aa

That would be good to see. I believe Rik is offline for a week
or three, so he may not see your words.

-
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/