SMP bug revealed by networking?

Justin Carlson (carlson@sibyte.com)
Mon, 7 May 2001 15:54:14 -0700


I'm working on a new SMP mips port, and tripped over a strange
bug. Am not quite sure how to attack it.

I'm running 2.4.2 from oss.sgi.com's cvs repositry, booting with
the root filesystem being over NFS with two cores active in
the system. The port is actually fairly stable at the moment, but
just once, while trying to exec init during a boot, it locked up.

Since I'm running in simulation (fun consequence of not porting to something
for which silicon doesn't yet exist), I took a peek and
saw this:

CPU0 was in ip_defrag() trying to aquire the ipq spinlock
CPU1 was idle.

ip_defrag() being an exceedingly unlikely source of bugs at this moment,
especially in terms of that simple locking, I'm thinking I must have a bug
in the SMP handling; however, I don't see any likely candidates there either.
I can't reproduce this bug, either; it seems to have been dependent on
timing from the network.

Has anyone run into anything remotely like this before? Any suggestions
on where to look for bug or how to make this reproducable in order to track it
down?

Thanks,
Justin
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/