2.5.65-mm2 vs 2.5.65-mm3 (full objrmap)

Martin J. Bligh (mbligh@aracnet.com)
Sat, 22 Mar 2003 08:06:10 -0800


> . Added Hugh's new rmap-without-pte_chains-for-anon-pages patches. Mainly
> for interested parties to test and benchmark at this stage.
>
> It seems to be stable, however it is not clear that this passes the
> benefit-vs-disruption test.

I see very little impact either way. My initial analysis showed that 90%
of the anonymous mappings were singletons, so the chain manipulation costs
are probably very low. If there's a workload that has long anonymous chains,
and manipulates them a lot, that might benefit.

However, I thought there might be some benefit in the fork/exec cycle
(which presumably sets up a new chain instead of the direct mapping then
tears it down again) ... but seemingly not. Did you keep the pte_direct
optimisation? That seems worth keeping, with partial objrmap as well
(I think that was removed in Dave's patch, but would presumably be easy
to put back). Or maybe we just need some more tuning ;-)

Results from 16x NUMA-Q below (that seems to have severe problems with
pte_chains, so is a good testbed for these things... )

Kernbench: (make -j N vmlinux, where N = 2 x num_cpus)
Elapsed System User CPU
2.5.65-mm2 44.04 80.63 566.83 1469.25
2.5.65-mm3 44.21 80.57 567.61 1466.00

Kernbench: (make -j N vmlinux, where N = 16 x num_cpus)
Elapsed System User CPU
2.5.65-mm2 44.27 88.56 574.21 1496.75
2.5.65-mm3 44.10 89.24 574.70 1503.75

Kernbench: (make -j vmlinux, maximal tasks)
Elapsed System User CPU
2.5.65-mm2 44.30 86.09 572.75 1488.25
2.5.65-mm3 44.36 86.86 573.28 1486.25

DISCLAIMER: SPEC(tm) and the benchmark name SDET(tm) are registered
trademarks of the Standard Performance Evaluation Corporation. This
benchmarking was performed for research purposes only, and the run results
are non-compliant and not-comparable with any published results.

Results are shown as percentages of the first set displayed

SDET 1 (see disclaimer)
Throughput Std. Dev
2.5.65-mm2 100.0% 2.2%
2.5.65-mm3 98.9% 2.0%

SDET 2 (see disclaimer)
Throughput Std. Dev
2.5.65-mm2 100.0% 2.7%
2.5.65-mm3 97.1% 2.3%

SDET 4 (see disclaimer)
Throughput Std. Dev
2.5.65-mm2 100.0% 1.3%
2.5.65-mm3 103.5% 1.3%

SDET 8 (see disclaimer)
Throughput Std. Dev
2.5.65-mm2 100.0% 1.0%
2.5.65-mm3 98.0% 1.0%

SDET 16 (see disclaimer)
Throughput Std. Dev
2.5.65-mm2 100.0% 0.6%
2.5.65-mm3 99.4% 1.0%

SDET 32 (see disclaimer)
Throughput Std. Dev
2.5.65-mm2 100.0% 0.2%
2.5.65-mm3 101.7% 0.5%

SDET 64 (see disclaimer)
Throughput Std. Dev
2.5.65-mm2 100.0% 0.1%
2.5.65-mm3 101.0% 0.4%

SDET 128 (see disclaimer)
Throughput Std. Dev
2.5.65-mm2 100.0% 0.5%
2.5.65-mm3 100.8% 0.6%

diffprofile (kernbench, + worse in -mm3)

353 0.0% set_page_dirty
284 0.0% try_to_unmap_one
224 17.1% page_add_rmap
193 5.8% find_get_page
142 1.8% d_lookup
113 10.6% link_path_walk
67 0.0% page_dup_rmap
56 9.4% __wake_up
53 0.0% rmap_get_cpu
46 8.8% find_vma
45 8.6% fd_install
44 0.0% page_turn_rmap
40 14.6% do_lookup
37 9.9% .text.lock.file_table
36 2.1% buffered_rmqueue
35 5.5% copy_process
34 10.5% pgd_ctor
33 97.1% exit_mmap
31 4.9% handle_mm_fault
...
-36 -80.0% profile_exit_mmap
-51 -14.9% pte_alloc_map
-53 -27.5% install_page
-99 -100.0% pte_chain_alloc
-127 -8.4% free_hot_cold_page
-128 -0.9% do_anonymous_page
-158 -100.0% __pte_chain_free
-238 -16.4% do_no_page
-293 -12.0% page_remove_rmap
-330 -0.2% total
-355 -100.0% __set_page_dirty_buffers
-666 -1.4% default_idle

diffprofile (sdet, + worse in -mm3)

2139 0.0% try_to_unmap_one
2032 0.0% page_dup_rmap
1508 0.0% set_page_dirty
1448 0.0% page_turn_rmap
223 9.9% link_path_walk
169 2.8% .text.lock.dcache
168 3.0% .text.lock.namei
158 1.9% find_get_page
139 1.2% d_lookup
104 6.8% .text.lock.attr
98 32.9% exit_mmap
97 5.0% d_alloc
93 21.8% generic_delete_inode
92 89.3% __blk_queue_bounce
90 0.0% rmap_get_cpu
83 0.9% .text.lock.dec_and_lock
70 0.0% dup_rmap
69 0.6% atomic_dec_and_lock
67 24.6% find_group_other
65 8.2% new_inode
59 0.9% path_lookup
50 4.8% prune_dcache
50 2.9% .text.lock.base
...
-51 -22.6% ext2_new_block
-56 -6.1% read_block_bitmap
-60 -3.6% __read_lock_failed
-66 -3.7% current_kernel_time
-67 -4.0% __find_get_block
-78 -21.9% group_reserve_blocks
-83 -2.5% do_anonymous_page
-84 -22.3% truncate_inode_pages
-85 -5.4% real_lookup
-87 -27.6% unlock_page
-106 -59.6% profile_exit_mmap
-107 -100.0% pte_chain_alloc
-134 -41.0% install_page
-158 -100.0% __pte_chain_free
-170 -6.4% kmem_cache_free
-170 -9.9% __wake_up
-202 -41.7% grab_block
-253 -1.7% copy_page_range
-266 -3.9% __copy_to_user_ll
-328 -1.7% zap_pte_range
-626 -6.9% release_pages
-679 -19.0% __down
-730 -23.9% do_no_page
-1492 -100.0% __set_page_dirty_buffers
-2051 -0.6% default_idle
-3399 -22.3% page_remove_rmap
-4052 -54.8% page_add_rmap
-6486 -1.0% total

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/