<!-- received="Fri Jun 25 12:28:30 1999 EET DST" -->
<!-- sent="Fri, 25 Jun 1999 10:37:28 +0200 (MEST)" -->
<!-- name="Rogier Wolff" -->
<!-- email="R.E.Wolff@BitWizard.nl" -->
<!-- subject="Re: eepro100 frame errors with SMP" -->
<!-- id="199906250837.KAA00641@cave.BitWizard.nl" -->
<!-- inreplyto="19990311215350.8623.qmail@defiant.cqc.com" -->
<title>Linux-kernel mailing list archive 1999-25,: Re: eepro100 frame errors with SMP</title>
<body bgcolor="#FFFFFF"><font face="Arial,Helvetica">
<h1>Re: eepro100 frame errors with SMP</h1>
<b>Rogier Wolff</b> (<a href="mailto:R.E.Wolff@BitWizard.nl"><i>R.E.Wolff@BitWizard.nl</i></a>)<br>
<i>Fri, 25 Jun 1999 10:37:28 +0200 (MEST)</i>
<p>
<ul>
<li> <b>Messages sorted by:</b> <a href="date.html#1147">[ date ]</a><a href="index.html#1147">[ thread ]</a><a href="subject.html#1147">[ subject ]</a><a href="author.html#1147">[ author ]</a>
<!-- next="start" -->
<li> <b>Next message:</b> <a href="1148.html">Janos Farkas: "Re: why is the size of a directory always 1024b ?"</a>
<li> <b>Previous message:</b> <a href="1146.html">Helge Hafting: "Re: Why Linux is doomed"</a>
<!-- nextthread="start" -->
<!-- reply="end" -->
</ul>
<hr>
<!-- body="start" -->
Alan Curry wrote:<br>
<i>&gt; Two different machines, each with an eepro100, are racking up frame errors,</i><br>
<i>&gt; according to the statistics on the cisco switch they are connected to.</i><br>
<i>&gt; Swapping cards between these and another machine, and booting many different</i><br>
<i>&gt; kernels, leads us to believe that the problem only exists when more than one</i><br>
<i>&gt; processor is being used. This smells like a driver bug to me.</i><br>
<i>&gt; </i><br>
<i>&gt; These errors show up as "TX overruns" from ifconfig on the Linux side and</i><br>
<i>&gt; frame/CRC errors from `show interfaces' on the cisco side.</i><br>
<p>
There are RX overruns, and TX underruns. I think you're seeing tx<br>
underruns.<br>
<p>
The eepro100 gets the data to be transmitted from main memory. TX<br>
underrun happens when the main memory can't keep up with the sending<br>
of the data onto the ethernet.<br>
<p>
So, hardware-wise your machine is misconfigured: The eepro100 cannot<br>
get 10Mbyte per second of throughput from main memory at times.<br>
<p>
This is NOT a driver problem. <br>
<p>
You could look into decreasing the "max_lat" value of all other<br>
devices on the PCI bus, and increasing it on the eepro100.<br>
<p>
<i>&gt; We told cisco, and they're so concerned about their switch they currently</i><br>
<i>&gt; have a team trying to reproduce the problem, but as far as we know they are</i><br>
<i>&gt; still in the "how do we install RedHat?" stage.</i><br>
<p>
I suggest telling them not to worry. You've found the problem, and it<br>
is your machine that is misconfigured....<br>
 <br>
<i>&gt; What can I do to further track down this problem?</i><br>
<p>
<p>
/*  <br>
 *  perform_memcpy.c<br>
 *<br>
 *<br>
 *<br>
 *  written by R.E.Wolff -- <a href="mailto:R.E.Wolff@BitWizard.nl">R.E.Wolff@BitWizard.nl</a><br>
 * <br>
 *<br>
 *              date          by     what<br>
 *  Written:    Apr 23 1997   REW    Initial revision.<br>
 *  changes:<br>
 *<br>
 * $Log: perform_memcpy.c,v $<br>
 * Revision 1.2  1997/11/13 14:56:59  wolff<br>
 * Created RCS Log.<br>
 *<br>
 *<br>
 *<br>
 *  who-is-who:<br>
 *    initials full name                 Email address<br>
 *    REW      Roger E. Wolff            <a href="mailto:R.E.Wolff@BitWizard.nl">R.E.Wolff@BitWizard.nl</a><br>
 *<br>
 * This program allows you to test wether the zoran chip is bothered<br>
 * by a rep; movsl instruction.<br>
 *<br>
 * */<br>
<p>
#include &lt;sys/time.h&gt;<br>
#include &lt;unistd.h&gt;<br>
#include &lt;stdlib.h&gt;<br>
#include &lt;stdio.h&gt;<br>
<p>
#ifndef SIZE <br>
#define SIZE 0x800000<br>
#endif<br>
<p>
unsigned char *makebuf (int size)<br>
{<br>
  unsigned char *t;<br>
<p>
  t = malloc (size);<br>
  return t;<br>
}<br>
<p>
int main (int argc, char **argv)<br>
{<br>
  unsigned char *p, *q, *r;<br>
  struct timeval start, stop;<br>
  int n=1;<br>
  int count=0;<br>
<p>
  if (argc &gt; 1)<br>
    n = atoi (argv[1]);<br>
<p>
  p = makebuf (SIZE);<br>
  q = makebuf (SIZE);<br>
 <br>
  while (n--) {<br>
    gettimeofday (&amp;start, NULL);<br>
    __builtin_memcpy (p, q, SIZE);<br>
    gettimeofday (&amp;stop, NULL);<br>
    <br>
    printf ("Elapsed time %d: %d usecs.\n",<br>
            count++,<br>
            (stop.tv_sec  - start.tv_sec) * 1000000 +<br>
            (stop.tv_usec - start.tv_usec));<br>
    r = p;<br>
    p = q;<br>
    q = r;<br>
  }<br>
  exit (0);<br>
}<br>
<p>
------------<br>
<p>
Note: "SIZE" is not a parameter of the program, because it needs to<br>
be a constant, for maximum effect. <br>
<p>
I expect that even on single processor systems you will see the<br>
problems. I expect that on dual processor systems you will be able to<br>
halt almost all network activity by running two copies of this...<br>
<p>
<p>
			Roger.<br>
<p>
<pre>
-- 
** <a href="mailto:R.E.Wolff@BitWizard.nl">R.E.Wolff@BitWizard.nl</a> ** <a href="http://www.BitWizard.nl/">http://www.BitWizard.nl/</a> ** +31-15-2137555 **
*-- BitWizard writes Linux device drivers for any device you may have! --*
------ Microsoft SELLS you Windows, Linux GIVES you the whole house ------
<p>
<p>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at <a href="http://www.tux.org/lkml/">http://www.tux.org/lkml/</a>
</pre>
<!-- body="end" -->
<hr>
<p>
<ul>
<!-- next="start" -->
<li> <b>Next message:</b> <a href="1148.html">Janos Farkas: "Re: why is the size of a directory always 1024b ?"</a>
<li> <b>Previous message:</b> <a href="1146.html">Helge Hafting: "Re: Why Linux is doomed"</a>
<!-- nextthread="start" -->
<!-- reply="end" -->
</ul>
</font></body>
