System Configurations:
  Red Hat 6.2, 2.4.5 kernel, 4-way 500MHz PIII, 2.5GB RAM,
  2MB L2 Cache, 50 disks with 5 ServeRAID 4H controllers
The script to run 10 dds:
dd if=/dev/raw/raw1 of=/dev/null bs=65536 count=2500 &
dd if=/dev/raw/raw4 of=/dev/null bs=65536 count=2500 &
dd if=/dev/raw/raw7 of=/dev/null bs=65536 count=2500 &
dd if=/dev/raw/raw10 of=/dev/null bs=65536 count=2500 &
dd if=/dev/raw/raw13 of=/dev/null bs=65536 count=2500 &
dd if=/dev/raw/raw16 of=/dev/null bs=65536 count=2500 &
dd if=/dev/raw/raw19 of=/dev/null bs=65536 count=2500 &
dd if=/dev/raw/raw22 of=/dev/null bs=65536 count=2500 &
dd if=/dev/raw/raw25 of=/dev/null bs=65536 count=2500 &
dd if=/dev/raw/raw28 of=/dev/null bs=65536 count=2500 &
(1) Under 2.4.5 Base (38 seconds)
SPINLOCKS        HOLD           WAIT
  UTIL  CON   MEAN(  MAX )  MEAN(  MAX )(% CPU)    TOTAL NOWAIT SPIN RJECT
emergency_lock
 23.8%    0% 2.8us(  54us)   0us                 3200001  100%    0%    0%
global_bh_lock
 24.6% 97.8%   80us(4385ms)  0us                  114825  2.2%    0% 97.8%
io_request_lock
 27.6% 11.5% 1.4us(  64us) 5.8us( 115us)( 3.4%)  7633079 88.5% 11.5%    0%
rmqueue+0x2c
  6.7% 13.6% 0.8us( 9.5us) 2.0us(  15us)(0.58%)  3200862 86.4% 13.6%    0%
emergency_lock    23.8 % + 0.00% * 4   = 0.24
global_bh_lock    24.6 % + 0.00% * 4   = 0.25
io_request_lock   27.6 % + 3.40% * 4   = 0.41
rmqueue+0x2c       6.7 % + 0.58% * 4   = 0.09
==================================================
                                 Sum   = 0.99 CPUs
(2) Under 2.4.5 + zero-bounce highmem I/O & IPS patches (22 seconds)
SPINLOCKS         HOLD            WAIT
  UTIL  CON    MEAN(  MAX )   MEAN(  MAX )(% CPU)     TOTAL NOWAIT SPIN
RJECT
global_bh_lock
 35.8%  8.9%  242us(2968us)    0us                    32543 91.1%    0%
8.9%
io_request_lock
 57.7% 59.4%  1.8us( 118us)   10us( 192us)(47.9%)   6914223 40.6% 59.4%
0%
 global_bh_lock    35.8% + 0.00% * 4    = 0.36
 io_request_lock   57.7% + 47.9% * 4    = 2.49
===================================================================
                                 Sum    = 2.85 CPUs
     Indeed, io_request_lock is very hot once the bounce buffers were
eliminated. Is anyone working on a patch for the io_request_lock that
possibly take the global lock and splits it into a per device queue lock?
We understand that getting this patch into 2.4 is unlikely, but it would
be nice to have this patch available on 2.4 for experimental purposes.
Wai Yee Peter Wong
IBM Linux Technology Center, Performance Analysis
email: wpeter@us.ibm.com
Office: (512) 838-9272, T/L 678-9272; Fax: (512) 838-4663
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/