Fragment flooding in 2.4.x/2.5.x

Trond Myklebust (trond.myklebust@fys.uio.no)
Thu, 27 Jun 2002 17:57:39 +0200


--------------Boundary-00=_3CGD5AIS88TX9N8JDLPN
Content-Type: text/plain;
charset="iso-8859-1"
Content-Transfer-Encoding: 8bit

Hi David,

I have a question about the case of non-blocking sends in
ip_build_xmit_slow(). While investigating a problem with the RH7.3 kernel
causing the Netapp filer IP stack to blow up, we've observed that use of the
MSG_DONTWAIT flag causes some pretty nasty behaviour.

The fact that fragments are immediately queued for sending means that if
sock_alloc_send_skb() fails at some point in the middle of the process of
building the message, then you've ended up sending off a bunch of fragments
for which there is not even a header (can be a large source of wasted
bandwidth given heavy NFS traffic).

The appended patch which was originally designed purely to test inverting the
sending order of fragments (on the hypothesis that the receiving devices were
making buffer management assumptions based on ordering), removes this effect
because it delays sending off the fragments until the entire message has been
built.
Would such a patch be acceptable, or is there a better way of doing this?

Cheers,
Trond
--------------Boundary-00=_3CGD5AIS88TX9N8JDLPN
Content-Type: text/plain;
charset="iso-8859-1";
name="ip_build_xmit_slow.dif"
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment; filename="ip_build_xmit_slow.dif"

--- linux-2.4.19-smp/net/ipv4/ip_output.c.orig Mon May 13 23:34:37 2002
+++ linux-2.4.19-smp/net/ipv4/ip_output.c Mon Jun 17 23:13:28 2002
@@ -437,6 +437,8 @@
struct rtable *rt,
int flags)
{
+ struct sk_buff_head frags;
+ struct sk_buff * skb;
unsigned int fraglen, maxfraglen, fragheaderlen;
int err;
int offset, mf;
@@ -512,10 +514,10 @@
*/

id = sk->protinfo.af_inet.id++;
+ skb_queue_head_init(&frags);

do {
char *data;
- struct sk_buff * skb;

/*
* Get the memory we require with some space left for alignment.
@@ -599,7 +601,11 @@
fraglen = maxfraglen;

nfrags++;
+ __skb_queue_head(&frags, skb);
+ } while (offset >= 0);

+ /* Ensure we send fragments in order of increasing offset */
+ while ((skb = __skb_dequeue(&frags)) != NULL) {
err = NF_HOOK(PF_INET, NF_IP_LOCAL_OUT, skb, NULL,
skb->dst->dev, output_maybe_reroute);
if (err) {
@@ -608,7 +614,7 @@
if (err)
goto error;
}
- } while (offset >= 0);
+ }

if (nfrags>1)
ip_statistics[smp_processor_id()*2 + !in_softirq()].IpFragCreates += nfrags;
@@ -617,6 +623,10 @@

error:
IP_INC_STATS(IpOutDiscards);
+ while ((skb = __skb_dequeue(&frags)) != NULL) {
+ kfree_skb(skb);
+ nfrags--;
+ }
if (nfrags>1)
ip_statistics[smp_processor_id()*2 + !in_softirq()].IpFragCreates += nfrags;
return err;

--------------Boundary-00=_3CGD5AIS88TX9N8JDLPN--
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/