Re: ZONE_NORMAL exhaustion (dcache slab)

Andrew Morton (akpm@digeo.com)
Mon, 21 Oct 2002 21:20:26 -0700


"Martin J. Bligh" wrote:
>
> > I cannot make it happen here, either. 2.5.43-mm2 or current devel
> > stuff. Heisenbug; maybe something broke dcache-rcu? Or the math
> > overflow (unlikely).
>
> Dipankar is going to give me some debug code once he's slept for
> a while ... that should help see if dcache-rcu went wacko.

Well if it doesn't happen again...

> >> So it looks as though it's actually ext2_inode cache that's first against the wall.
> >
> > Well that's to be expected. Each ext2 directory inode has highmem
> > pagecache attached to it, which pins the inode. There's no highmem
> > eviction pressure so your normal zone gets stuffed full of inodes.
> >
> > There's a fix for this in Andrea's tree, although that's perhaps a
> > bit heavy on inode_lock for 2.5 purposes. It's a matter of running
> > invalidate_inode_pages() against the inodes as they come off the
> > unused_list. I haven't got around to it yet.
>
> Thanks; no urgent problem (though we did seem to have a customer hitting
> a very similar situation very easily in 2.4 ... we'll see if Andrea's
> fixes that, then I'll try to reproduce their problem on current 2.5).

Oh it's reproduceable OK. Just run

make-teeny-files 7 7

against a few filesystems and watch the fun

http://www.zip.com.au/~akpm/linux/patches/stuff/make-teeny-files.c

> >> larry:~# egrep '(dentry|inode)' /proc/slabinfo
> >> isofs_inode_cache 0 0 320 0 0 1 : 120 60
> >> ext2_inode_cache 667345 809181 416 89909 89909 1 : 120 60
> >> shmem_inode_cache 3 9 416 1 1 1 : 120 60
> >> sock_inode_cache 16 22 352 2 2 1 : 120 60
> >> proc_inode_cache 12 12 320 1 1 1 : 120 60
> >> inode_cache 385 396 320 33 33 1 : 120 60
> >> dentry_cache 1068289 1131096 160 47129 47129 1 : 248 124
> >
> > OK, so there's reasonable dentry shrinkage there, and the inodes
> > for regular files whch have no attached pagecache were reaped.
> > But all the directory inodes are sitting there pinned.
>
> OK, this all makes a lot of sense ... apart from one thing:
> from looking at meminfo:
>
> HighTotal: 15335424 kB
> HighFree: 15066160 kB
>
> Even if every highmem page is pagecache, that's only 67316 pages by
> my reckoning (is pagecache broken out seperately in meminfo? both
> Buffers and Cached seem to large). If I only have 67316 page of
> pagecache, how can I have 667345 inodes with attatched pagecache pages?
> Or am I just missing something obvious and fundamental?

Maybe you didn't cat /dev/sda2 for long enough?

You should end up with very little dcache and tons of icache.
Here's what I get:

ext2_inode_cache: 420248KB 420256KB 99.99
buffer_head: 40422KB 41648KB 97.5
dentry_cache: 667KB 10211KB 6.54
biovec-BIO_MAX_PAGES: 768KB 780KB 98.46

Massive internal fragmentation of the dcache there. But it takes
a long time.

Generally, I feel that the proportional-shrink on slab is applying
too much pressure when there's not much slab and too little when
there's a lot. If you have 400 megs of inodes I don't really think
they are likely to be used again soon.

Perhaps we need to multiply the slab cache scanning pressure by the
slab occupancy. That's simple to do.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/