Re: [patch] 2.4.13-pre3 arm/i386/mips/mips64/s390/s390x/sh die() deadlock

Keith Owens (kaos@ocs.com.au)
Tue, 16 Oct 2001 14:27:55 +1000


On Tue, 16 Oct 2001 12:58:21 +1000,
Keith Owens <kaos@ocs.com.au> wrote:
>On Mon, 15 Oct 2001 19:36:02 -0700 (PDT),
>Linus Torvalds <torvalds@transmeta.com> wrote:
>>I much prefer a dead machine with a partially visible oops over a oops
>>where the original oops has scrolled away due to recursive faults.
>
>IMHO it is unrealistic to expect that all code inside die() will never
>fail. Any unexpected kernel corruption could cause the register or
>backtrace dump to fail. The patch gets the best of both worlds. It
>protects against recursive errors and against concurrent calls to
>die().

Previous message sent too soon.

The patch makes two attempts at dumping registers, one for the original
oops and one if die() fails, then it gives up. The second attempt is
useful for diagnosing why die() is failing, without that data it is
difficult to fix die() itself.

I was aiming to improve error handling in the rare case that die()
failed so we could get better diagnostics in the long term, by fixing
the problems that make die() fail. If you think that this would scroll
away useful data then we can compromise.

if (++die_lock_owner_depth < 2+(CONFIG_DIAGNOSE_RECURSIVE_DIE+0)) {

CONFIG_DIAGNOSE_RECURSIVE_DIE
If this variable is selected then the kernel will attempt to provide
extra diagnostics in the rare cases when the kernel die() routine
itself dies. This may cause useful information from the first
failure to be lost. Unless you want to diagnose the die() and
show_regs() code in the kernel, say N here.

Acceptable?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/