Re: [PATCH] Rmap speedup

Andrew Morton (akpm@zip.com.au)
Fri, 02 Aug 2002 22:24:59 -0700


No joy, I'm afraid.

2.5.26:

./daniel.sh 39.78s user 71.72s system 368% cpu 30.260 total
quad:/home/akpm> time ./daniel.sh
./daniel.sh 38.45s user 70.00s system 365% cpu 29.642 total

c0132b0c 328 1.03288 free_page_and_swap_cache
c013074c 334 1.05177 lru_cache_add
c0112a64 428 1.34778 do_page_fault
c0144e90 434 1.36667 link_path_walk
c01388b4 458 1.44225 do_page_cache_readahead
c01263e0 468 1.47374 clear_page_tables
c01319b0 479 1.50838 __free_pages_ok
c01e0700 514 1.61859 radix_tree_lookup
c012a8a8 605 1.90515 find_get_page
c01079f8 640 2.01537 page_fault
c01127d4 649 2.04371 pte_alloc_one
c0131ca0 811 2.55385 rmqueue
c0127cc8 1152 3.62766 do_anonymous_page
c013230c 1421 4.47474 page_cache_release
c0126880 1544 4.86207 zap_pte_range
c012662c 1775 5.58949 copy_page_range
c0127e70 1789 5.63358 do_no_page
c012750c 6860 21.6022 do_wp_page

Stock 2.5.30:

./daniel.sh 36.60s user 88.23s system 366% cpu 34.029 total
quad:/home/akpm> time ./daniel.sh
./daniel.sh 37.22s user 87.88s system 354% cpu 35.288 total

c014fdc4 191 0.872943 __d_lookup
c01310c0 203 0.927788 kmem_cache_alloc_batch
c0114154 227 1.03748 do_page_fault
c0146ea8 243 1.1106 link_path_walk
c0132fd0 257 1.17459 __free_pages_ok
c0134284 279 1.27514 free_page_and_swap_cache
c0131538 309 1.41225 kmem_cache_free
c0107c90 320 1.46252 page_fault
c012ca48 326 1.48995 find_get_page
c012a220 349 1.59506 handle_mm_fault
c0128520 360 1.64534 clear_page_tables
c0113ed0 367 1.67733 pte_alloc_one
c013129c 399 1.82358 kmem_cache_alloc
c01332bc 453 2.07038 rmqueue
c0129df4 557 2.5457 do_anonymous_page
c013392c 689 3.14899 page_cache_release
c0128a60 832 3.80256 zap_pte_range
c0129fa0 893 4.08135 do_no_page
c0128828 1081 4.94059 copy_page_range
c013aa74 1276 5.83181 page_add_rmap
c013ab3c 3094 14.1408 page_remove_rmap
c01296a8 3466 15.841 do_wp_page

2.5.30+pagemap_lru_lock stuff

quad:/home/akpm> time ./daniel.sh
./daniel.sh 41.01s user 97.15s system 373% cpu 36.996 total
quad:/home/akpm> time ./daniel.sh
./daniel.sh 36.67s user 87.04s system 368% cpu 33.575 total

c0131d60 230 1.08979 kmem_cache_alloc_batch
c0148728 231 1.09453 link_path_walk
c01321d8 238 1.12769 kmem_cache_free
c01142b4 240 1.13717 do_page_fault
c0135624 291 1.37882 free_page_and_swap_cache
c012a8cc 323 1.53044 handle_mm_fault
c0128790 326 1.54466 clear_page_tables
c0107c90 338 1.60152 page_fault
c0131f3c 350 1.65837 kmem_cache_alloc
c0113f20 373 1.76735 pte_alloc_one
c012d2a8 397 1.88107 find_get_page
c013466c 415 1.96636 rmqueue
c0132f74 449 2.12746 __pagevec_release
c012a3bc 532 2.52073 do_anonymous_page
c012a5b0 772 3.6579 do_no_page
c0128da0 854 4.04643 zap_pte_range
c0128b48 1031 4.8851 copy_page_range
c013c054 1244 5.89434 page_add_rmap
c013c11c 3088 14.6316 page_remove_rmap
c0129b58 3206 15.1907 do_wp_page

2.5.30+pagemap_lru_lock+this patch:

quad:/home/akpm> time ./daniel.sh
./daniel.sh 38.78s user 91.56s system 366% cpu 35.534 total
quad:/home/akpm> time ./daniel.sh
./daniel.sh 38.07s user 88.64s system 363% cpu 34.883 total

c0135a90 332 1.30853 free_page_and_swap_cache
c013c57c 332 1.30853 page_add_rmap
c012ad4d 337 1.32824 .text.lock.memory
c0132448 353 1.3913 kmem_cache_free
c0128790 372 1.46618 clear_page_tables
c0107c90 377 1.48589 page_fault
c01142b4 423 1.66719 do_page_fault
c0113f20 432 1.70266 pte_alloc_one
c012d518 438 1.72631 find_get_page
c013c91c 438 1.72631 .text.lock.rmap
c01321ac 443 1.74602 kmem_cache_alloc
c012aafc 453 1.78543 handle_mm_fault
c01349fc 463 1.82485 rmqueue
c012a5ec 655 2.58159 do_anonymous_page
c01331e4 748 2.94813 __pagevec_release
c012a7e0 992 3.90982 do_no_page
c0128e90 1426 5.62037 zap_pte_range
c0128b48 1586 6.25099 copy_page_range
c013c5c8 2324 9.1597 __page_remove_rmap
c0129d88 4028 15.8758 do_wp_page

- page_add_rmap has vanished
- page_remove_rmap has halved (80% of the remaining is the
list walk)
- we've moved the cost into the new locking site, zap_pte_range
and copy_page_range.

So rmap locking is still a 15% slowdown on my soggy quad, which generally
seems relatively immune to locking costs. PPC will like the change
because spinlocks are better than bitops. ia32 should have liked it
for the same reason but, as I say, this machine doesn't seem to have
the bandwidth*latency to be affected much by these things.

On more modern machines and other architectures this remains
a significant problem for rmap, I expect.

Guess we should instrument it up and make sure that the hashing
and index thing is getting the right locality. I saw UML-for-2.5.30
whizz past, if you have time ;)

Broken out patches are at
http://www.zip.com.au/~akpm/linux/patches/2.5/2.5.30/
Rolled-up patch is at
http://www.zip.com.au/~akpm/linux/patches/2.5/2.5.30/everything.gz
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/