Re: Kernel deadlock using nbd over acenic driver

Steven Whitehouse (steve@gw.chygwyn.com)
Thu, 16 May 2002 09:04:31 +0100 (BST)


Hi,

[snip]
>
> I don't see any reason to introduce a second flag to say when a flag
> has been set .. Initial reports are that symptoms go away when
>
> current->flags |= PF_MEMALLOC;
>
> is set in the process about to do networking (and unset afterwards).
>
> There will be more news later today. I believe that this will remove
> deadlock against VM for tcp buffers, but I don't believe it will
> stop deadlocks against "nothing", when we simply are out of buffers.
> The only thing that can do that is reserved memory for the socket.
> Any pointers?
>
> Peter
>

The reason for adding the second flag is that I suspect that nbd_send_req()
can be called by processes which already have PF_MEMALLOC set, in which case
we don't want to alter that. The "priority inversion" that I mentioned occurs
when you get processes without PF_MEMALLOC set calling nbd_send_req() as when
they call through to page_alloc.c:__alloc_pages() they won't use any memory
once the free pages hits the min mark even though there is memory available
(see the code just before and after the rebalance label).

Once one process has started sleeping waiting for memory in nbd_send_req()
thats is, since tx_lock prevents any further writeouts until the sleeping
process has completed. Unfortunately this has to be the case in order to
ensure that nbd's requests are sent atomically.

So rather than reserve memory specifically for sockets, in effect the
min free pages for each zone place a limit on what "normal" allocations
may use as a maximum. This is fine provided allocations in the write out
path are not "normal" as well, but able to use whatever they need. At first
I thought "we only need to set PF_MEMALLOC if we are writing" but in fact
we have to set it for reads too so that reads don't block writes I think.

There is a difference though between preventing the deadlock and adjusting
the system so that we get the maximum performance, so it will be interesting
to see whether we ought to adjust the min free pages figure in order to
get higher performance, or whether its ok as it is.

I'm not sure yet that the PF_MEMALLOC change I described actually fixes the
problem either, although it should make things a lot better. Thats
something else for further investigation.

Steve.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/