Their 99 SOSP paper gives a discussion of there arguments and some results
from their implementation using IRIX.
http://www-flash.stanford.edu/~kinshuk/sosp99.pdf
eric
---------------------------
Eric Lemar
elemar@cs.washington.edu http://www.cs.washington.edu/homes/elemar
On Thu, 20 Jan 2000, Larry McVoy wrote:
> Whatever, can the people who are really interested in high performance
> take a look at
>
> http://www.bitmover.com/llnl/smp.{ps,pdf}
> and then
> http://www.bitmover.com/llnl/labs.{ps,pdf}
>
> I'll briefly summarize here. No justification for these statements are
> here, there are some in the papers.
>
> Premise 1: SMP scaling is a bad idea beyond a very small number processors.
> The reasoning for this is that when you start out threading a kernel,
> it's just a few locks. That quickly evolves into more locks, and
> for a short time, there is a 1:1 mapping between each sort of object
> in the system (file, file system, device, process, etc) and a lock.
> So there can be a lot of locks, but there is only one reader/writer
> lock per object instance. This is a pretty nice place to be - it's
> understandable, explainable, and maintainable.
>
> Then people want more performance. So they thread some more and now
> the locks aren't 1:1 to the objects. What a lock covers starts to
> become fuzzy. Thinks break down quickly after this because what
> happens is that it becomes unclear if you are covered or not and
> it's too much work to figure it out, so each time a thing is added
> to the kernel, it comes with a lock. Before long, your 10 or 20
> locks are 3000 or more like what Solaris has. This is really bad,
> it hurts performance in far reaching ways and it is impossible to
> undo.
>
> Premise 2: most/all locking follows a canonical form of "take a global
> data structure, split it up into N, where N is a function of the
> number of CPUs, and give each CPU or group of CPUS, their own data
> structure. Classic example: global/local run queues.
>
> Premise 3: it is far easier to take a bunch of operating system images
> and make them share the parts they need to share (i.e., the page
> cache), than to take a single image and pry it apart so that it
> runs well on N processors.
>
> All of this leads us to an interesting twist on the clustering idea,
> one that I give credit for mostly to DEC (they have some of this
> implemented already). Suppose you were to take a single big machine
> and run another instance of the OS every N processors where N is chosen
> so that it is well under the knee of the locking curve, i.e., around 4,
> maybe 8, but certainly no more than 8.
>
> Multiple OS's on a single box? Wacky, huh? But if you think about it,
> you've just instantly taken *EVERY* data structure in the kernel and
> multi threaded it. Cool, no? And it cost you nothing but some boot
> code.
>
> That's kinda cute but not very useful because what you really want is to
> be able to have all processors working together on the same data with only
> one copy of the data. In other words, I don't care if I have one OS or
> 1000, I want all processors to be able to mmap /space/damn_big_file and
> poke at it. And I don't want any stinkin' DSM - I want real, hardware
> based coherency. Well, bucky, I'm here to tell ya, praise the lord,
> you can have it :-) You need to make an SMPFS which lets other OS's
> put reference counts on your inodes. The operation is extremely
> similar to what you have to do when you invalidate a page - you shoot
> down the other processor's TLB entries. So we need something like
> that in the reverse.
>
> If you think I'm waving my hands wildly, I am. But this is definitely
> doable, and as hard as it seems, it is easily an order of magnitude easier
> than threading the kernel to get to even 32 processors. I've lived
> through that twice, it's about a 7 year process (in hind sight; before
> hand, everyone said it would be maybe 18 months).
>
> Read the papers. Think. Think again. Let's talk. I can set up a
> perf@bitmover aliase if this becomes too off topic.
>
> --lm
>
> P.S. I call these SMP clusters, to distinguish them from HA or HPC clusters.
> SMP clusters are for the enterprise - these are the clusters that will get
> Linux on big iron running Oracle and kicking serious butt. Fast.
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.rutgers.edu
> Please read the FAQ at http://www.tux.org/lkml/
>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/