Re: OOPS: Multipath routing 2.4.17

Julian Anastasov (ja@ssi.bg)
Sat, 2 Mar 2002 18:10:50 +0000 (GMT)


Hello,

On Sat, 2 Mar 2002 kuznet@ms2.inr.ac.ru wrote:

> > What about the new scheduler (for 2.5?), of course, after
> > replacing the wrong write_lock() with spin_lock_bh(&fib_nh_powers) ?
>
> I do not see any reasons not to do this.
> I remember your approach had some inacceptable issues, but 2.5 is exactly
> the place to resolve them. :-)
>
> But actually I would like to see a fix for 2.4 for beginning.

OK, I'll try it soon, I assume you ack about the
new fib_select_multipath scheduler discussed in this thread.

> The failure with orphaned DEADs was hard bug yet. Minute...
> I remember I did some work to make a minimalistic fix...

Yep, I'm wondering, nobody complains about this problem :)
May be it is still not too late to fix it :)

> Aha! That's it. Please, look at this _carefully_. It is going
> to be submitted to 2.4 and mistakes are not allowed here.
> Look especially at the differences of your approach both about
> medium_id and DEAD fault.

I see, very good, you restore the nh_dev on "enable IP"
which is detroyed on "disable IP".

About medium_id, I didn't tested your variant but I see what you mean:
proxy_arp must be enabled for the receiver, distinguish 0/-1.
Sounds good, looks good, only doc changes:

- change "media" to "medium"

- docs in Documentation/filesystems/proc.txt or the net part
is going out from this file?

About the FIB changes, check fib_sync_down because I see a
bad scenario, i.e. change:

if (force && nh->nh_dev) {

to

if (force && nh->nh_dev == dev && nh->nh_flags&RTNH_F_DEAD) {

and move it before endfor_nexthops because we can miss the
following sequence of events:

- enable IP
- link up
...
- link down => nh is DEAD after force=0 but with valid nh_dev!=NULL

and we come here with the new change:

- disable IP/unreg with force=1 => we miss the above event and
don't clear nh_dev

In short, nh can be already DEAD with nh_dev!=NULL and we miss it when
force is 1.

So, the new code must be something like this:

}
+ if (force && nh->nh_dev == dev &&
+ nh->nh_flags&RTNH_F_DEAD) {
+ dev_put(nh->nh_dev);
+ nh->nh_dev = NULL;
+ }
} endfor_nexthops(fi)

There is a second variant: we to keep all DEAD nhs to be with
nh_dev==NULL but I'm not sure about it. At least, you already added
a mechanism the nexthops to bind again to outdev and this can work,
may be. But I prefer the first solution after fixing.

> Alexey

> + two devices attached to different media.

medium

Regards

--
Julian Anastasov <ja@ssi.bg>

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/