Gigabit Performance 2.4.19-preX - Excessive locks, calls, waits

Jeff V. Merkey (jmerkey@vger.timpanogas.org)
Mon, 4 Mar 2002 00:12:23 -0700


More bottlenecks located during SCI/Gigabit Ethernet testing
and profiling. Configuration is 2.4.19-pre2(3) running SCI
and Intel e1000 gigbit ethernet adapters. In this scenario,
the Gnet adapter is DMA'ing frames from a gigabit segment
directly into reflective memory mapped into an SCI adapter
address space, then immediately triggering an outbound
DMA of the data over an SCI clustering fabric into
the memory of a remote node. In essense a GNET to SCI
routing fabric.

Performance thoughput numbers are stable for the most part since
we are at the maximum throughput with Intel's GNET adapters at
124 MB/S, however, processor utilization, locking, etc. is far more
excessive than necessary. We are also spending too much time calling
kmalloc/kfree during skb contruct/destruct operations. Also,
Intel's adapter by default has the ring buffer size in the driver
set to 256 packets, and our skb hot_list count we before discarding
free skb header frames is too low for these GNET adapters (128),
resulting in packet overruns intermittently.

Increasing these numbers and using a fixed frame size consistent with GNET
(less 9K jumbo frames) instead of kmalloc'ing/kfree'ing the
skb->data portion of these frames all the time yields a decrease
in remote receipt latency and lower utilization and bus
utilization.

Measured latency of packets coming off the SCI interface on the remote
side of the clustering fabric is 3-4% higher between the two
test scenarios.

The modifications made to skbuff.c are extensive and driver changes
were also required to get around these performance problems. Data
provided for review. Recommend a minumum change of increasing
the sysctl_hot_list_len from 128 to 1024 by default. I have reviewed
(and modified) the skbuff and all the copy stuff related to mapping
fragment lists, etc. and this code is quite a mess.

NetWare always created ECB's (Event Control Blocks) at the max size
of the network adpapter rather than trying to allocate fragment
elements on the fly the way is being done in Linux with skb's.

Bottom line is this stuff is impacting performance and IO bandwidth,
and needs to be corrected. Default hot_list size should be
increased by default.

/usr/src/linux/net/core/skbuff.c

//int sysctl_hot_list_len = 128;
int sysctl_hot_list_len = 1024; // bump this value up

alloc_skb with calls to kmalloc/kfree 2.4.19-pre2 with code
"as is". Notice high call rate to kmalloc/kfree and corresponding
higher utilization (@ 7%)

36324 total 0.0210
28044 default_idle 584.2500
1117 __rdtsc_delay 34.9062
927 eth_type_trans 4.4567
733 skb_release_data 5.0903
645 kmalloc 2.5195
638 kfree 3.9875
463 __make_request 0.3180
415 __scsi_end_request 1.3651
382 alloc_skb 0.8843
372 tw_interrupt 0.3633
241 kfree_skbmem 1.8828
233 scsi_dispatch_cmd 0.4161
233 __generic_copy_to_user 3.6406
194 __kfree_skb 0.9327
184 scsi_request_fn 0.2396
103 ip_rcv 0.1238
88 __wake_up 0.5000
84 do_anonymous_page 0.4773
72 do_softirq 0.4091

52 processes: 51 sleeping, 1 running, 0 zombie, 0 stopped
CPU states: 0.0% user, 32.8% system, 0.0% nice, 67.1% idle
Mem: 897904K av, 869248K used, 28656K free, 0K shrd, 3724K buff
Swap: 1052216K av, 0K used, 1052216K free 46596K cached

alloc_skb_frame with fixed 1514 + fragment list allocations,
sysctl_hot_list_len = 1024.

34880 total 0.0202
28581 default_idle 595.4375
1125 __rdtsc_delay 35.1562
1094 eth_type_trans 5.2596
657 skb_release_data 4.5625
378 __make_request 0.2596
335 alloc_skb_frame 1.1020
334 tw_interrupt 0.3262
293 __scsi_end_request 0.9638
208 scsi_dispatch_cmd 0.3714
193 __kfree_skb 0.9279
184 scsi_request_fn 0.2396
160 kfree_skbmem 1.2500
90 __generic_copy_to_user 1.4062
81 ip_rcv 0.0974
68 __wake_up 0.3864
59 do_anonymous_page 0.3352
48 do_softirq 0.2727
43 generic_make_request 0.1493
43 alloc_skb 0.0995

50 processes: 49 sleeping, 1 running, 0 zombie, 0 stopped
CPU states: 0.0% user, 27.5% system, 0.0% nice, 72.4% idle
Mem: 897904K av, 841280K used, 56624K free, 0K shrd, 2220K buff
Swap: 1052216K av, 0K used, 1052216K free 22292K cached

Jeff

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/