________________________________________________
Get your own "800" number
Voicemail, fax, email, and a lot more
http://www.ureach.com/reg/tag
---- On 	Wed, 4 Apr 2001, Mark Hemment (markhe@veritas.com)
wrote:
> 
>   I believe David Miller's latest zero-copy patches might help
here.
>   In his patch, the pull-up buffer is now allocated near the
top of
> stack
> (in the sunrpc code), so it can be a blocking allocation.
>   This doesn't fix the core VM problems, but does relieve the
pressure
> _slightly_ on the VM (I assume, haven't tried David's patch
yet).
> 
>   One of the core problems is that the VM keeps no measure of
> page fragmentation in the free page pool.  The system reaches
the state
> of
> having plenty of free single pages (so kswapd and friends
aren't kicked
> - or if they are, they do no or little word), and very few
buddied pages
> (which you need for some of the NFS requests).
> 
>   Unfortunately, even with keeping a mesaure of fragmentation,
and
> insuring work is done when it is reached, doesn't solve the
next
> problem.
> 
>   When a large order request comes in, the inactive_clean page
list is
> reaped.  As reclaim_page() simply selects the "oldest" page it
can, with
> no regard as to whether it will buddy (now, or 'possibily in
the near
> future), this list is quickly shrunk by a large order request
- far too
> quickly for a well behaved system.
> 
>   An NFS write request, with an 8K block size, needs an
order-2 (16K)
> pull
> up buffer (we shouldn't really be pulling the header into the
same
> buffer
> as the data - perhaps we aren't any more?).  On a well used
system, an
> order-2 _blocking_ allocation ends up populating the order-0
and order-1
> with quite a few pages from the inactive_clean.
> 
>   This then triggers another problem. :(
> 
>   As large (non-zero) order requests are always from the
NORMAL or DMA
> zones, these zones tend to have a lot of free-pages (put there
by the
> blind reclaim_page() - well, once you can do a blocking
allocation they
> are, or when the fragmentation kicking is working).
>   New allocations for pages for the page-cache often ignore
the HIGHMEM
> zone (it reaches a steady state), and so is passed over by the
loop at
> the
> head of __alloc_pages()).
>   However, NORMAL and DMA zones tend to be above pages_low
(due to the
> reason above), and so new page-cache pages came from these
zones.  On a
> HIGHMEM system this leads to thrashing of the NORMAL zone,
while the
> HIGHMEM zone stays (relatively) quiet.
>   Note: To make matters even worse under this condition,
pulling pages
> out
> of the NORMAL zone is exactly what you don't want to happen! 
It would
> be
> much better if they could be left alone for a (short) while to
give them
> chance to buddy - Linux (at present) doesn't care about the
budding of
> pages in the HIGHMEM zone (no non-zero allocations come from
there).
> 
>   I was working on these problems (and some others) a few
months back,
> and
> will to return to them shortly.  Unfortunately, the changes
started to
> look too large for 2.4....
>   Also, for NFS, the best solution now might be to give the
nfsd threads
> a
> receive buffer.  With David's patches, the pull-up occurs in
the context
> of a thread, making this possible.
>   This doesn't solve the problem for other subsystems which do
non-zero
> order page allocations, but (perhaps) they have a low enough
frequency
> not
> to be of real issue.
> 
> 
> Kapish,
> 
>   Note: Ensure you put a "sync" in your /etc/exports - the
default
> behaviour was "async" (not legal for a valid SpecFS run).
> 
> Mark
> 
> 
> On Wed, 4 Apr 2001, Alan Cox wrote:
> 
> > > 	We have been seeing some problems with running nfs
benchmarks
> > > at very high loads and were wondering if somebody could
show
> > > some pointers to where the problem lies.
> > > 	The system is a 2.4.0 kernel on a 6.2 Red at distribution
( so
> > 
> > Use 2.2.19. The 2.4 VM is currently too broken to survive
high I/O
> benchmark
> > tests without going silly
> > 
> > -
> > To unsubscribe from this list: send the line "unsubscribe
> linux-kernel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at 
http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at  http://www.tux.org/lkml/
> > 
> 
> 
> 
> 
> 
> -
> To unsubscribe from this list: send the line "unsubscribe
linux-kernel"
> in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at 
http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 
> 
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/