Re: [RFC] [PATCH] Net device error logging (revised)

Jim Keniston (jkenisto@us.ibm.com)
Tue, 27 May 2003 16:17:48 -0700


Valdis.Kletnieks@vt.edu wrote:
>
> On Fri, 23 May 2003 23:21:24 PDT, Jim Keniston said:
>
> > - With Steve Hemminger's creation of the "net" device class a few days
> > ago, the network device's interface name is now sufficient to find the
> > information about the underlying device in sysfs (even without running
> > ethtool). So these macros no longer log the device's driver name and
> > bus ID.
>
> Is something *else* logging the driver name and bus ID?

Short answer: Not in Linux 2.5.70.

Long answer: Some folks in the LTC are designing an error-log analysis (ELA)
system that will have access to sysfs as well as the event stream coming out
of the kernel. Once the network interface is registered with sysfs -- in
v2.5.70, it's via netdev_register_sysfs, as called from register_netdevice --
you can find the net_device's info in sysfs based on the interface name.
The aforementioned ELA system could then annotate the event record with the
desired additional data out of sysfs (including driver name and bus ID).

Before the net_device is registered (at least), you'd presumably want to log
the driver name and bus ID. One obvious way to do this is to have the probe
function call dev_* macros instead of netdev_* until register_netdev runs.
Another way could be via netdev_*, if we made netdev_* smart enough to log
the driver name and bus ID if netdev->class_dev.class isn't set yet.

There's clearly a difference of opinion among various developers as to whether
logging the interface name alone is sufficient. Either way, I think it's a
win to have the net_device's pointer (as opposed to its name, if you're lucky)
handy when logging info about the net device; and to have the message format
live in one spot (netdevice.h) rather than all over drivers/net.

>
> Just because an interface is called 'eth2' when the message is logged
> doesn't mean it's still eth2 when you actually *read* the message.
> And no, you *can't* rely on finding the "renaming bus-ID to ethN" message
> in the logs - if the system is unstable the last bit of local logging may
> go bye-bye if not synced, and messages about network hardware problems are
> *very* prone to not making it to the syslog server (I wonder why? ;).
>
> Been there, done that, it's not fun. Almost swapped out the wrong eth1
> a *second* time before realizing what was really going on...
>

Was the name slippage due to the intervention of an administrative utility?
Just curious.

Thanks.
Jim Keniston
IBM Linux Technology Center
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/