Re: 8-CPU (SMP) #s for lockfree rtcache

Dipankar Sarma (dipankar@in.ibm.com)
Tue, 28 May 2002 22:39:45 +0530


On Tue, May 28, 2002 at 09:55:35PM +0530, Dipankar Sarma wrote:
> Hi Robert,
>
> On Tue, May 28, 2002 at 08:49:58AM -0700, Robert Love wrote:
> >
> > > Well, the last time RCU was discussed, Linus said that he would
> > > like to see someplace where RCU clearly helps.
> >
> > I agree the numbers posted are nice, but I remain skeptical like Linus.
> > Sure, the locking overhead is nearly gone in the profiled function where
> > RCU is used. But the overhead has just been _moved_ to wherever the RCU
> > work is now done. Any benchmark needs to include the damage done there,
> > too.
>
> Have you looked at the rt_rcu patch ? Where do you think there
> is overhead compared to what route cache alread does ? In my
> profiles, rcu routines and kernel mechanisms that it uses
> don't show high up. If you have any suggestions, then I can
> do an investigation.

Hi Robert,

While we are at it, I think this is good point to analyze.
So here is an brief analysis of rt_rcu patch from the overhead
standpoint -

1. Read side has no overhead, we just don't take the per-bucket lock.
2. For just the route cache portion of code, RCU comes into picture
only when dst entries are deleted. This however has two issues -
a> expiry of dst entries is checked through a non-frequent
timer b>lease for recently used dst entries are extended.
So we don't do frequent RCU based deletion of dst entries.
Periodically a set of dst entries expire and instead of
freeing them immediately, we just put them in RCU queue(s)
for freeing after the grace period (call_rcu() in rt_free()).

Coming to the RCU mechanism -

1. Grace period detection : Different RCU algorithms do it
differently, however if there is no RCU pending *nothing*
is done regarding this. One rcu implementation uses
a 10ms timer to check for grace period completion and another
rcu_poll uses a repeating tasklet to poll for it. The grace period
detection is based on a per-cpu context switch counter. I have not seen
signficant profile counts for grace period detection scheme, but
nevertheless I will put up the profile counts for Dave's test
at the LSE website.

2. Actual update : RCU processes the batched update callbacks from tasklet
context. The rt_rcu callbacks don't do anything other than
call dst_free(), which would have been called by non-RCU
code under lock in any case. I am not sure doing this from
tasklet context adds any overhead and I suspect that it doesn't.

Comments/suggestions ?

Thanks

-- 
Dipankar Sarma  <dipankar@in.ibm.com> http://lse.sourceforge.net
Linux Technology Center, IBM Software Lab, Bangalore, India.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/