Re: What does "Neighbour table overflow" message indicate?

Carlos O'Donell Jr. (carlos@baldric.uwo.ca)
Mon, 30 Jul 2001 08:38:29 -0400


> network module. In this case it only ensures, that the printk message is not
> printed too often. The actual condition why the message is printed is above
> this if.
>
> Greetings
> Bernd
> -

Snyder,

Just by looking at your email, @home, I can guess that your
cable modem is connected to an HFC Cable network segment.

In general these segments are extremely large and due to the
nature of the users, can cause large amounts of arp broadcast
traffic during peak times.

The message you are seeing is directly related to your arp cache
overflowing.

I've seen this message during high traffic hours on my 2.2.x
firewall.

Things to check:

- Is your netmask set correctly?
- Check to see how many hosts are on your segment?

======================================================
Why the kernel spat what it spat : blow by blow
======================================================

N.B. Using 2.4.7 Kernel Source.

I think the critical point is:

In route.c:

639: int err = arp_bind_neighbour(&rt->u.dst);
640: if (err) {
... [snip]

Which means that if the binding of an arp neighbour fails, then
we trod down the path closer towards that printk, that has
caused us so much distress.

In arp.c, we look for "arp_bind_neighbour" and find it on line 429:

Right off the bat, we hope that:

434: if (dev == NULL)
435: return -EINVAL;

Isn't the case :)

Unless, it's alredy bound, then the next line is the case...

436: if (n == NULL) {

And the only return that is non-zero is from:

440: n = __neigh_lookup_errno(
441:#ifdef CONFIG_ATM_CLIP
442: dev->type == ARPHRD_ATM ? &clip_tbl :
443:#endif
444: &arp_tbl, &nexthop, dev);
445: if (IS_ERR(n))
446: return PTR_ERR(n);

So __neigh_lookup_errno is the culprit...

In ./include/net/neighbour.h we have the function defined:

266:static inline struct neighbour *
267:__neigh_lookup_errno(struct neigh_table *tbl, const void *pkey,
268:struct net_device *dev)
...
275: return neigh_create(tbl, pkey, dev)

Is the interesting point.. since our table is overflowing, we
need to find the point where the entry is created :)

Off we go to line 288 in ./net/core/neighbour.c:
(I love to trace source!)

296: n = neigh_alloc(tbl);
297: if (n == NULL)
298: return ERR_PTR(-ENOBUFS);

Hrm... -ENOBUFS :)

In neigh_alloc, same file:

235: if (tbl->entries > tbl->gc_thresh3 ||
236: (tbl->entries > tbl->gc_thresh2 &&
237: now - tbl->last_flush > 5*HZ)) {
238: if (neigh_forced_gc(tbl) == 0 &&
239: tbl->entries > tbl->gc_thresh3)
240: return NULL;
241: }

Which leads us to note that if the cache is growing faster than
the garbage collecting (ref counting code) is being done, and we
begin to exceed our allocations, we will trigger a table
overflow.

Can you make the tables bigger?
What type of inpact does this have?
Should we be asking @Home to make segments smaller?
(Probably not possible)

In ./net/ipv4/arp.c you could change the GC collection parameters...
I'm not sure how they were tuned?

Line 187:
gc_interval: 30 * HZ,
gc_thresh1: 128,
gc_thresh2: 512,
gc_thresh3: 1024,

Hrm... just pondering.

=================================================================

Cheers,
Carlos O'Donell Jr.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/