Re: PATCH 2.5.x disable BAR when sizing

Linus Torvalds (torvalds@transmeta.com)
Sat, 21 Dec 2002 14:44:23 -0800 (PST)


[ Ivan added to the cc, to see if he has any ideas on turning things off ]

On 21 Dec 2002, Eric W. Biederman wrote:
>
> Actually it is not quite as bad as that.
> - We can reasonably assume there are no pci to pci transactions going
> on, so the only accesses to a pci resource are generated by the
> kernel from printk.

Actually, I think it's certainly valid to not allow "printk()" to happen
around the BAR probing, at least at bootup when we control all the CPU's
tightly anyway.

And hotplug devices should be disabled at plug-in, so later BAR probing
should be pretty harmless too (and, as you point out about bridging, they
should be shielded by the hotplug bridge itself).

> - If the large device is behind a pci bridge it should be shielded
> from the chaos.
>
> - If we don't call printk until we restore the old BAR value (which
> is currently the case (drivers/pci/probe.c:60)) there should be no
> transactions on the pci bus, that get a conflicting routing.

The problem has been at least in the case I saw it that there are devices
that aren't entirely quiescent, often because we haven't even _gotten_ to
them yet, and the boot sequence left them active.

The one I saw was USB, and that's likely to be the worst case, since it's
one of very few devices that tends to "do stuff" even when inactive (ie a
USB setup walks the USB command tables in memory continuously, even if
nothing is happening). It's also one of the few classes of devices that
many PC's have SMM support for, so they are still alive even after the
BIOS has otherwise given up control.

> As long as the pci bus is quiet while we are sizing a bar the current
> method safe.

Well, the thing is, as long as the PCI bus is 100% quiet, it simply
doesn't _matter_ which method we use.

The interesting cases are all "some activity that we don't know about is
going on". That's the thing that breaks disabling the PCI device, but it's
also the thing that can break _not_ disabling the PCI device.

So if we can guarantee a quiescent PCI bus, then I could also accept the
patch that disables MEM/IO resources for BAR probing. At that point it
simply shouldn't matter any more, and then I'd happily drop my concerns
about it.

This is why I repeated my "turn the power off at the whole house" analogy,
even if David didn't like it. It's _fine_ to turn the power off if we know
things are quiet, it's just that as things stand now, we don't actually
know that.

If somebody wants to try to follow that method, I can try to dig out the
machine that I had problems on before and test things at least on that
setup to make myself happier about the fact that it really solves the
problem. The solution may be as simple as just making our current
two-phase PCI scanning be a _three_phase one:

- (new) phase 1 - scan for and turn off all devices
- phase 2 - go back and check the resources (BAR probing etc)
- phase 3 - allocate unassigned resources.

One of the problems with turning off devices is that we actually have a
hard time doing so. We can trivially turn off IO/MEM/DMA, but PCI doesn't
have a good way to turn off interrupts (which in turn can become SCI
events).

Which still makes me worry about legacy USB in particular - simply because
I wonder what happens if the USB controller raises an interrupt which
causes an SMM event, which then causes trouble because the SMM handler
will be unhappy when the device isn't there any more.

We've actually had those kinds of problems in real life, see the
quirk_piix3_usb() quirks, for example. So I'm really not trying to be
difficult here, it's just that PC BIOS issues, and SMM in _particular_
tends to be quite a horrible mess for the early boot sequence.

Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/