Re: [NFS] Race in rpc_delete_timer causes crash

Ion Badulescu (ionut@badula.org)
Sun, 9 Mar 2003 01:12:13 -0500


On Sun, 9 Mar 2003 00:03:45 +0100 (MET), Ulrich Weigand <weigand@immd1.informatik.uni-erlangen.de> wrote:
> Hello,
>
> we're seeing a rare and hard to trigger crash on s390 where rpc_run_timer
> calls via an invalid callback pointer.

Myself and Jakob Oestergaard have seen the same race, and the tentative
fix from Trond was similar to yours. I haven't been able to reproduce
the problem after applying that fix.

Perhaps it's time to propagate the patch upstream? Most recent 2.4.x
kernels are affected...

> What appears to happen is that rpc_call_sync allocates a struct rpc_task
> (with its embedded tk_timer) on the stack, and the timer gets set up
> sometime during rpc_execute. However, the timer actually triggers at
> a point in time where the original call to rpc_call_sync has already
> returned, and the stack space overwritten by other data. That data is
> now interpreted as an rpc_task struct holding a tk_timeout_fn pointer by
> rpc_run_timer, which causes the Oops (actually, Aieee).

Yup, that's the race all right.

Ion

-- 
  It is better to keep your mouth shut and be thought a fool,
            than to open it and remove all doubt.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/