Re: Counter-kludge for 2.5.x hanging when writing to block device

Andrew Morton (akpm@digeo.com)
Tue, 3 Jun 2003 03:00:23 -0700


Jens Axboe <axboe@suse.de> wrote:
>
> On Tue, Jun 03 2003, Adam J. Richter wrote:
> > For at least the past few months, the Linux 2.5 kernels have
> > hung when I try to write a large amount of data to a block device.

Well ytf is this the first time I've heard about it?

> > I most commonly notice this when trying to clear a disk with a command
> > like "dd if=/dev/zero of=/dev/discs/disc1/disc". Sometimes doing
> > an mkfs on a big file system is enough to cause the hang.
> > I wrote a little program to repeatedly write a 4kB block of zeroes
> > to the kernel so I could track how far it got before hanging, and it
> > would write 210-215MB of zeroes to the disk on a computer that had
> > 512MB of RAM before hanging. When these hangs occur, other processes
> > continue to run fine, and I can do syncs, which return, but the
> > hung process never resumes. In the past, I've verified with a
> > printk that it is looping in balance_dirty_pages, repeatedly
> > calling blk_congestion_wait, and never leaving the loop.
> >

Please debug it further. Something may have gone wrong with the arithmetic
in balance_dirty_pages().

> > Here is a counter-kludge that seems to stop the problem.
> > This is certainly not the "right" fix. It just illustrates a way
> > to stop the problem.
> >
> > By the way, I say "counter-kludge", because I get the impression
> > that blk_congestion_wait is itself a kludge, since it calls
> > blk_run_queues and waits a fixed amount of time, 100ms in this case,
> > potentially a big waste of time, rather than awaiting some more
> > accurate criterion.

The sleep in blk_congestion_wait() terminates when a request is returned to
the queue. The timeout is only really there for non-request-based backing
devices.

> Does something like this work? Andrew, what's the point of doing the
> wait if the queue isn't congested?!

We need to wait until the amount of dirty memory in the machine is below
the designated limits. This is unrelated to queue congestion. The way the
logic is now we can have 256 megs worth of requests queues on a 32M machine
and everything throttles and clamps as intended.

There are several things wrong with blk_congestion_wait(), including:

a) it should be called throttle_on_io()

b) it should check that there are still requests in flight after parking
itself on the waitqueue rather than relying on the timeout.

c) for memory reclaim we should terminate the sleep on a certain number
of pages coming unreclaimable, not on write requests being returned or
timeout.

d) network filesystems should be delivering wakeups to throttled
processes rather than relying on the timeout.

But none of these have proven sufficiently problematic to justify futzing
with it. I expect d) will eventually need to be implemented.

As for Adam's hang: dunno. I and many others have run mkfs and dd an
unbelievable number of times. He needs to debug it more.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/