Re: 2.4.16 freezed up with eepro100 module

Andrey Savochkin (saw@saw.sw.com.sg)
Sat, 1 Dec 2001 13:17:59 +0300


Hi,

On Fri, Nov 30, 2001 at 04:17:17PM -0800, Mike Fedyk wrote:
>
> On Fri, Nov 30, 2001 at 04:31:31PM -0600, Nathan Poznick wrote:
> > Thus spake Sven Heinicke:
> > >
> > > I have eepro100's on other systems and never had a problem. They
> > > never have been made to work as hard as the DELLs though. I am
> > > trying the same DELL with a 3C996-T 1000Bt card using the driver from
> > > 3COM (we plan on moving that system to a 1000Bt system but the switch
> > > hasn't arrived yet) and it is running at 100Bt with the same
> > > software. If you don't hear form me assume it surrived. Been up a
> > > day so far, took the DELL like 3 days of heavy use to crash before.
> >
> > Ok, I finally had a chance to work on this, and here's what I know:
> >
> > 1) I found a workload under which I was able to reliably make the
> > network on the machine die (a few hundred of the "eth0: card reports
> > no resources." errors showed up which continued until I took down the
> > network and removed the module). Unfortunately, the workload was with
> > an in-house app, so all I can describe are the conditions associated
> > with it: 2 processes with a total of about 600 threads, 1.5gb of
> > memory, about 500 network connections, and a lot of disk and network
> > I/O.

Do you see "can't fill rx buffer" messages?
If so, then your load is too big, and memory management is incapable of
freeing memory in time.
Right now the kernel doesn't allow to increase atomic allocation
reservation (which is a serious misfeature), so you need to hack and
change the reservation in the kernel.

If the network doesn't come alive when you remove the load, it's a second
problem, a bug in the driver. I've seen such reports, but they aren't
frequent. On my computer, the driver resumes operations well.
Why the driver can't do it for some people needs deep investigations.

> >
> You can run the test against eepro100 with tcpdump redirected to a log file,
> and post that on the web somewhere. That would probably be helpful.

tcpdumps won't help.

Andrey
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/