Indeed. I didn't mean to exclude anything by omission.
> And how does /proc/irq/NR/max_rate solve this?
> I have a feeling you are trying to say that varying /proc/irq/NR/max_rate
> gives opportunity for user processes to execute;
> note, although that is bad logic, you could also modify the high and low
> watermarks for when we have congestion in the backlog queue
> (This is already doable via /proc)
The high and low watermarks are only sufficient if the task the machine is
performing is limited to bh mode operations. What I mean is that user space
can be starved by the cyclic nature of the network queues: they will
eventually be emptied, at which time more interrupts will be permitted.
> It is unfair to add any latency to a device that didnt cause or
> contributre to the havoc.
I disagree. When a machine is overloaded, everything gets slower. But a
side effect of delaying interrupts is that more work gets done for each
irq handler that is run and efficiency goes up. The hard part is balancing
the two in an attempt to achieve a steady rate of progress.
> I think you missed my point. i am saying there is more than one source of
> interupt for that same IRQ number that you are indiscrimately shutting
> down in a network device.
You're missing the effect that irq throttling has: it results in a system
that is effectively running in "polled" mode. Information does get
processed, and thruput remains high, it is just that some additional
latency is found in operations. Which is acceptable by definition as
the system is under extreme load.
> So, assuming that tx complete interupts do actually shut you down
> (although i doubt that very much given the classical Donald Becker tx
> descriptor prunning) pick another interupt source; lets say MII link
> status; why do you want to kill that when it is not causing any noise but
> is a source of good asynchronous information (that could be used for
> example in HA systems)?
That information will eventually be picked up. I doubt the extra latency
will be of significant note. If it is, you've got realtime concerns,
which is not our goal to address at this time.
> and what is this "known safe limit"? ;->
It's system dependant. It's load dependant. For a short list of the number
of factors that you have to include to compute this:
- number of cycles userspace needs to run
- number of cache misses that userspace is forced to
incur due to irq handlers running
- amount of time to dedicate to the irq handler
- variance due to error path handling
- increased system cpu usage due to higher memory load
- front side bus speed of cpu
- speed of cpu
- length of cpu pipelines
- time spent waiting on io cycles
It is non-trivial to determine a limit. And trying to tune a system
automatically is just as hard: which factor do you choose for the system
to attempt to tune itself with? How does that choice affect users who
want to tune for other loads? What if latency is more important than
There are a lot of choices as to how we handle these situations. They
all involve tradeoffs of one kind or another. Personally, I have a
preference towards irq rate limiting as I have measured the tradeoff
between latency and thruput, and by putting that control in the hands of
the admin, the choice that is best for the real load of the system is
not made at compile time.
If you look at what other operating systems do to schedule interrupts
as tasks and then looks at the actual cost, is it really something we
want to do? Linux has made a point of keeping things as simple as
possible, and it has brought us great wins because we do not have the
overhead that other, more complicated systems have chosen. It might
be a loss in a specific case to rate limit interrupts, but if that is
so, just change the rate. What can you say about the dynamic self
tuning techniques that didn't take into account that particular type
of load? Recompiling is not always an option.
> What we are providing is actually a scheme to exactly measure that "known
> safe limit" you are refering to without depending on someone having to
> tell you "here's a good number for that 8 way xeon"
> If there is system capacity available why the fsck is it not being used?
That's a choice for the admin to make. Sometimes having reserves that aren't
used is a safety net that people are willing to pay for. ext2 has by
default a reserve that isn't normally used. Do people complain? No. It
buys several useful features (resistance against fragmentation, space for
daemon temporary files on disk full, ...) that pay dividends of the cost.
Is irq throttling the be all and end all? No. Can other techniques work
better? Yes. Always? No. And nothing prevents us from using this and
other techniques together. Please don't dismiss it solely because you
see cases that it doesn't handle.
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to email@example.com
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/