Re: alpha iommu fixes

Gérard Roudier (groudier@club-internet.fr)
Sun, 20 May 2001 16:23:34 +0200 (CEST)


On Sun, 20 May 2001, Ivan Kokshaysky wrote:

> On Sun, May 20, 2001 at 04:40:13AM +0200, Andrea Arcangeli wrote:
> > I was only talking about when you get the "pci_map_sg failed" because
> > you have not 3 but 300 scsi disks connected to your system and you are
> > writing to all them at the same time allocating zillons of pte, and one
> > of your drivers (possibly not even a storage driver) is actually not
> > checking the reval of the pci_map_* functions. You don't need a pte
> > memleak to trigger it, even regardless of the fact I grown the dynamic
> > window to 1G which makes it 8 times harder to trigger than in mainline.
>
> I think you're too pessimistic. Don't mix "disks" and "controllers" --
> SCSI adapter with 10 drives attached is a single DMA agent, not 10 agents.
>
> If you're so concerned about Big Iron, go ahead and implement 64-bit PCI
> support, it would be right long-term solution. I'm pretty sure that
> high-end servers use mostly this kind of hardware.
>
> Oh, well. This doesn't mean that I'm disagreed with what you said. :-)
> Driver writers must realize that pci mappings are limited resources.

The IOMMU code allocation strategy is designed to fail due to
fragmentation as everything that performs contiguous allocations of
variable quantities.

I may add a test of pci_map_* return code in the sym53c8xx driver, but
the driver will panic on failure. It is not acceptable to consider such
kind of failure as a normal situation (returning some ?_BUSY status to
the SCSI driver) for the following reasons:

- IOs may be reordered and break upper layers assumptions.
- Spurious errors and even BUS resets may happen.

For now, driver methods that are requested to queue IOs are not allowed to
wait for resources. Anyway, the pci_map_* interface is unable to wait.

There are obviously ways to deal gracefully with such resource lack, but
the current SCSI layer isn't featured for that. For example, a
freeze/unfreeze mechanism as described in CAM can be implemented in order
not to reorder IOs, and some mechanism (callback, resource wait, etc...)
must be added to restart the operation when resource is likely to be
available.

IMO, the only acceptable fix in the current kernel is to perform IOMMU PTE
allocations of a fixed quantity at a time, as limiting SG entry to fit in
a single PAGE for example.

Gérard.

PS: May-be I should look how *BSD's handles IOMMUs.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/