Re: 2.4.13 (and 2.4.5) performance drops under load and does not

David Lang (david.lang@digitalinsight.com)
Wed, 31 Oct 2001 15:22:03 -0800 (PST)


This problem is also happening on 2.4.14pre6.

however one thing that I did find

in this firewall setup I am useing ipchains REDIRECT to map port 1433 on
four IP addresses to ports 1434-1437 to get to the proxies.

if I do the same test, but instead of having all 5 copies of ab hitting
port 1433 I make them connect to the real port of the proxy the slowdown
does not happen (I have run 5 iterations of the load test without any
slowdown)

So this slowdown appears to be in the ipchains emulation code or the
iptables port mapping code.

I think my next step is going to be to try to use iptables instead of
ipchains to do the port mapping.

David Lang

On Wed, 31 Oct 2001, David Lang wrote:

> Date: Wed, 31 Oct 2001 13:48:43 -0800 (PST)
> From: David Lang <david.lang@digitalinsight.com>
> To: torvalds@transmeta.com
> Cc: linux-kernel@vger.kernel.org
> Subject: 2.4.13 (and 2.4.5) performance drops under load and does not
> recover
>
> symptoms: when the firewall is freshly booted it supports >200
> connections/sec, when the firewall is hit with lots of simultanious
> requests it slows down, after the flood of requests stops and the machine
> sits idle it never recovers back to it's origional performance. I can
> drive the performance down to <50 connections/sec permanently. The CPU
> is eaten up more and more by the system, leaving less time for userspace.
>
> the single threaded requests do outperform the massivly parallel test (as
> can be expected as the scheduler isn't optimized for long runqueues :-) I
> am not asking to optimize things for the large runqueue, just to be able
> to recover after the long runqueue had finished.
>
> killing and restarting the proxy does not fix the problem. The only way I
> have discovered to resotre performance is a reboot of the firewall.
>
> test setup:
>
> client---firewall---server
>
> all boxes are 1.2GHz athlon, 512MB ram, IDE drive, D-link quad ethernet,
> 2GB swap space (although it never comes close to useing it)
>
> ethernet switches are cisco 3548
>
> firewall is running 4 proxies, each proxy accepts incoming connections,
> forks to handle that connection, relays the data to/from the server and
> then exits. (yes I know this is not the most efficiant way to do things,
> but this _is_ a real world use)
>
> I did a vmstat 1 on the firewall during the test and exerpts from it are
> listed below.
>
> server is running apache (maxclients set to 250 after prior runs noted
> that it ran up against the 150 limit)
>
> client is running ab (apache benchmark tool)
>
> testing procedure:
>
> 1. boot firewall
>
> 2. get performance data
> run ab -n 400 firewall:port/index.html.en twice (first to get past
> initalization delays, second to get a performace value)
> This does 400 sequential connections and reports the time to get the
> responses
>
> 3. flood the firewall
> run 5 copies of ab -n 5000 -c 200 firewall:port/index.html.en
> this does 5000 connections attempting to make up to 200 connections at a
> time
>
> 4. wait for all the connections to clear on the firewall (at least,
> possibly longer if I am in a meeting). The idea here is to return
> everything to a stable mode.
>
> 5. run ab -n 400 firewall:port/index.html and record the connections/sec
>
> loop back to 3.
>
> after 4 iterations of load reboot the firewall and do two more runs of ab
> -n 400 to make sure that performance is back up and the slowdown is not
> due to large logfiles anywhere.
>
> at the start of the test vmstat looks like this
>
>
> ab -n 200 (219 connections/sec) this is two runs sequentially
> procs memory swap io system cpu
> r b w swpd free buff cache si so bi bo in cs us sy id
> 0 0 0 0 496744 604 8076 0 0 0 0 101 11 0 0 100
> 1 0 1 0 496548 604 8124 0 0 48 0 3814 1759 36 41 24
> 0 0 0 0 496588 604 8124 0 0 0 0 4007 1899 42 34 25
> 0 0 0 0 496576 604 8128 0 0 0 0 103 16 0 0 100
> 1 0 0 0 496452 604 8128 0 0 0 0 924 405 11 6 83
> 2 0 0 0 496360 604 8128 0 0 0 0 4252 2015 41 40 20
> 0 0 0 0 496492 604 8128 0 0 0 0 2628 1240 26 21 53
> 0 0 0 0 496492 604 8128 0 0 0 0 101 7 0 0 100
>
>
> 5xab -n 5000 -c 200 (~210 connections/sec)
> procs memory swap io system cpu
> r b w swpd free buff cache si so bi bo in cs us sy id
> 19 0 1 0 493192 604 8124 0 0 0 0 5938 1139 55 45 0
> 13 0 1 0 492980 604 8124 0 0 0 0 4956 1059 58 42 0
> 11 0 1 0 492692 604 8124 0 0 0 0 4878 1057 45 55 0
> 17 0 1 0 492900 604 8124 0 0 0 0 4890 1072 57 43 0
> 29 0 1 0 492572 604 8124 0 0 0 0 4648 1049 46 54 0
> 13 0 1 0 492608 604 8124 0 0 0 0 4650 1041 50 50 0
> 11 0 1 0 492484 604 8124 0 0 0 0 4645 1043 38 62 0
> 18 0 1 0 492472 604 8128 0 0 0 0 4779 1029 56 44 0
> 17 0 1 0 492440 604 8128 0 0 0 0 4691 1057 46 54 0
> 18 0 1 0 492476 604 8128 0 0 0 0 4598 1074 54 46 0
> 18 0 1 0 492488 604 8128 0 0 0 0 4625 1051 53 47 0
> 22 0 1 0 492388 604 8128 0 0 0 0 4661 1057 50 50 0
> 10 0 1 0 492448 604 8128 0 0 0 0 4569 1033 56 44 0
> 22 0 1 0 492364 604 8128 0 0 0 0 4589 1036 48 52 0
> 18 0 1 0 492384 604 8128 0 0 0 0 4536 1031 48 52 0
> 15 0 1 0 492236 604 8128 0 0 0 0 4528 1034 43 57 0
> 26 0 1 0 492132 604 8128 0 0 0 0 4554 1037 50 50 0
> 24 0 1 0 492016 604 8128 0 0 0 0 4518 1037 48 52 0
>
> at the end of 4 runs
>
> 5xab -n 5000 -c 200 (~40 connections/sec)
> procs memory swap io system cpu
> r b w swpd free buff cache si so bi bo in cs us sy id
> 23 0 1 0 482356 624 8524 0 0 0 0 1053 181 7 93 0
> 29 0 1 0 481724 624 8528 0 0 0 0 960 206 7 93 0
> 23 0 1 0 482300 624 8528 0 0 0 0 1059 176 5 95 0
> 30 0 1 0 482104 624 8528 0 0 0 72 1095 197 9 91 0
> 19 0 1 0 482760 624 8528 0 0 0 0 1014 179 5 95 0
> 31 0 1 0 481892 624 8528 0 0 0 0 1010 198 6 94 0
> 16 0 1 0 482488 624 8524 0 0 0 0 1001 176 8 92 0
> 29 0 1 0 482236 624 8524 0 0 0 0 1037 179 9 91 0
> 12 0 1 0 483008 624 8528 0 0 0 0 1182 188 8 92 0
> 25 0 1 0 482620 624 8528 0 0 0 0 988 173 8 92 0
> 19 0 1 0 482696 624 8528 0 0 0 0 931 173 7 93 0
> 20 0 1 0 482776 624 8528 0 0 0 0 985 171 9 91 0
> 20 0 1 0 482116 624 8528 0 0 0 0 1122 119 5 95 0
> 21 0 1 0 482888 624 8528 0 0 0 0 830 144 3 97 0
> 16 0 1 0 481916 624 8528 0 0 0 0 1054 155 9 91 0
> 18 0 1 0 482892 624 8528 0 0 0 0 875 143 12 88 0
> 20 0 1 0 483100 624 8528 0 0 0 0 875 146 7 93 0
> 27 0 1 0 482428 624 8528 0 0 0 0 859 153 6 94 0
> 26 0 1 0 482688 624 8528 0 0 0 0 874 151 8 92 0
>
> ab -n 400 (~48 connections/sec) this is a single run
> procs memory swap io system cpu
> r b w swpd free buff cache si so bi bo in cs us sy id
> 0 0 0 0 483552 624 8544 0 0 0 0 103 13 0 0 100
> 2 0 0 0 483388 624 8544 0 0 0 0 592 225 10 38 52
> 1 0 0 0 483392 624 8544 0 0 0 0 1088 488 13 87 0
> 2 0 0 0 483392 624 8544 0 0 0 0 1108 468 10 89 1
> 1 0 0 0 483404 624 8548 0 0 0 0 1081 452 9 89 2
> 1 0 0 0 483508 624 8548 0 0 0 0 1096 486 7 90 3
> 2 0 1 0 483368 624 8548 0 0 0 0 1091 458 11 87 2
> 1 0 0 0 483364 624 8548 0 0 0 0 1085 480 13 85 2
> 1 0 0 0 483356 624 8552 0 0 0 0 1098 478 9 87 4
> 0 0 0 0 483520 624 8548 0 0 0 0 796 330 9 59 32
> 0 0 0 0 483520 624 8548 0 0 0 0 103 13 0 0 100
>
> Please tell me what additional things I should try to identify what the
> kernel is getting tied up doing.
>
> I was able to duplicate this problem with 2.4.5 as well as 2.4.13. the
> performance of 2.4.5 is ~4% lower then 2.4.13 in all cases.
>
> David Lang
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/