cache is the filesystem cache, all your program images, the whole SHM
area used by sap and the swap can be cached too, so the shm memory can
showup both as cache as swap and as swacache at the same time, probably
you've plenty of swap so there's no need to reclaim shm and anonymous
pages from the swap space, this allows you to do zero-IO cost swapouts
of clean swapcache pages for example which is relevant in a scenario
> later (soon) referenced again then the system is in a need of more
> memory or the VM didn't predict well. The latter case should occur
> infrequently. In the former no clever piece of software would help
> anyway. So, why cache swap?
primarly beause this way if you don't modify it, the next time you need
to swap it to disk it will be zero I/O cost. Secondly because for
various consistency reason (especially with directio) we must be able to
mark swap cache dirty (and let the vm to collect it away, like we do
with the non-swap cache), and being able to mark swapcache dirty (rather
than reclaiming it from the swapcache by the time you write to it) is
helpful also to try to avoid fragmenting the swap too (so we allocate
the swap space only once and we keep overwriting in the same place).
After half ot the swap is full, the -aa vm stops caching the swap
aggressively because then the priority becomes not running out of
virtual memory, not anymore to swapout as fast as we possibly can.
One of the reason the performance may slowdown over time is also swap
fragmentation, the dirty cache will try to avoid it but it still can
happen and we don't defragment it aggressively. If you had enough memory
for it, it would be interesting if performance returns fine after a
swapoff -a/swapon -a (but I think you don't have enough ram and the
swapoff would lead to either killing tasks or swapoff failure). However
you should be able to verify that the performance returns at its peak
after a restart cycle of the app server. This almost guarantees the
kernel is doing fine.
> elvtune gives:
> /dev/sda elevator ID 2
> read_latency: 128
> write_latency: 512
> max_bomb_segments: 0
> Which seems fine to me. Anyway, with this much swapping (100-800Kpps)
> it won't help. I'll do some testing later with file transfer.
the elvtune suggestion was intended only for the file transfer. Just to
give it a spin you can try with a -r 10 (just to see if you notice any
But really you need to upgrade to pre4aa3 where I improved some bits in
elevator-lowlatency before testing again the file transfer stalling
> > 2.4.21pre4aa3 has also extreme scalability optimizations that generates
> > three digits percent improvements on some hardware, however those won't
> > help latency directly. These optimizations will also change partly when
> > the vm starts swapping, and it will defer the "swap" point somehow, this
> > new behaviour (besides the greatly improved scalability) is also
> > beneficial to very shm-userspace-cache intensive apps.
> It is exactly the case here:
> # df /dev/shm
> Filesystem 1k-blocks Used Available Use% Mounted on
> shmfs 16384000 5798364 10585636 36% /dev/shm
Yep I expected a scenario like this ;)
> > You can revert to
> > the non-scalable behaviour (but possibly more desiderable on small
> > desktop/laptops) by using echo 1 >/proc/sys/vm/vm_anon_lru. You should
> > also try 'echo 1 >/proc/sys/vm/vm_anon_lru' if you see the VM isn't
> > swapping well enough and that it shrinks too much cache after upgrading
> > to 2.4.21pre4aa3.
> I hope I will be able to give this one a try.
btw, be careful with the vm_mapped_ratio, 1000 may be too much if you
really need to swap a lot to get good performance. It is possible the
100 value by default is optimal for your workload.
Also remeber that if pushed at the maximum the vm will be forced to run
at the speed of the disk no matter how much the VM is good, there is an
hardware disk limit in how fast it can swap and behave. however the good
VM will run as worse at the speed-of-the-disk-during-seeks and never
much slower of what the disk can deliver during some seeking.
> > Thanks for the interesting feedback!
> Thank you for your input.
You're very welcome!
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to email@example.com
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/