Re: latest linus-2.5 BK broken

Eric W. Biederman (ebiederm@xmission.com)
20 Jun 2002 08:54:42 -0600


Larry McVoy <lm@bitmover.com> writes:

> > I totally agree, mostly I was playing devils advocate. The model
> > actually in my head is when you have multiple kernels but they talk
> > well enough that the applications have to care in areas where it
> > doesn't make a performance difference (There's got to be one of those).
>
> ....
>
> > The compute cluster problem is an interesting one. The big items
> > I see on the todo list are:
> >
> > - Scalable fast distributed file system (Lustre looks like a
> > possibility)
> > - Sub application level checkpointing.
> >
> > Services like a schedulers, already exist.
> >
> > Basically the job of a cluster scheduler gets much easier, and the
> > scheduler more powerful once it gets the ability to suspend jobs.
> > Checkpointing buys three things. The ability to preempt jobs, the
> > ability to migrate processes, and the ability to recover from failed
> > nodes, (assuming the failed hardware didn't corrupt your jobs
> > checkpoint).
> >
> > Once solutions to the cluster problems become well understood I
> > wouldn't be surprised if some of the supporting services started to
> > live in the kernel like nfsd. Parts of the distributed filesystem
> > certainly will.
>
> http://www.bitmover.com/cc-pitch
>
> I've been trying to get Linus to listen to this for years and he keeps
> on flogging the tired SMP horse instead.

Hmm. My impression is that Linux has been doing SMP but mostly because
it hasn't become a nightmare so far. Linus just a moment ago noted that
there are scaleablity limits, to SMP.

As for the cc-SMP stuff.
a) Except dual cpu systems no-one makes affordable SMPs.
b) It doesn't solve anything except your problem with locks.

You have presented your idea, and maybe it will be useful. But at
the moment it is not the place to start. What I need today is process
checkpointing. The rest comes in easy incremental steps from there.

For me the natural place to start is with clusters, they are cheaper
and more accessible than SMPs. And then work on the clustering
software with gradual refinements until it can be managed as one
machine. At that point it should be easy to compare which does a
better job for SMPs.

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/