Re: Why Plan 9 C compilers don't have asm("")

Johan Kullstam (kullstam@ne.mediaone.net)
08 Jul 2001 21:22:46 -0400


Victor Yodaiken <yodaiken@fsmlabs.com> writes:

> On Fri, Jul 06, 2001 at 06:44:31PM +0000, Linus Torvalds wrote:
> > On ia64, you probably end up with function calls costing even more than
> > alpha, because not only does the function call end up being a
> > synchronization point for the compiler, it also means that the compiler
> > cannot expose any parallelism, so you get an added hit from there. At
>
> That seems amazingly dumb. You'd think a new processor design would
> optimize parallel computation over calls, but what do I know?
>
> > Most of these "unconditional branches" are indirect, because rather few
> > 64-bit architectures have a full 64-bit branch. That means that in
>
> This is something I don't get: I never understood why 32bit risc designers
> were so damn obstinate about "every instruction fits in 32 bits"
> and refused to have "call 32 bit immediate given in next word" not
> to mention a "load 32bit immediate given in next word".
> Note, the superior x86 instruction set has a 5 byte call immediate.

the 32 bit MIPS (R3K series, at least) has a 32 bit instruction which
loads a 16 bit immediate (which fits within the instruction itself).
thus to load a 32 bit number takes two instructions. since the
instructions are all 32 bits and must live on a multiple of 4 bytes,
this is as compact as you can get given the alignment constraint.

note that x86 is also fussy about alignment in various cases, e.g.,
double-precision floats.

> > There are lots of good arguments for function calls: they improve icache
> > when done right, but if you have some non-C-semantics assembler sequence
> > like "cli" or a spinlock that you use a function call for, that would
> > _decrease_ icache effectiveness simply because the call itself is bigger
> > than the instruction (and it breaks up the instruction sequence so you
> > get padding issues).
>
> I think anywhere that you have inner loop or often used operations
> that are short assembler sequences, inline asm is a win - it's easy to
> show for example, that the Linux asm x86 macro semaphore down
> is three times as fast as
> a called version. I wish, however
> that GCC did not use a horrible overly complex lisplike syntax

lisp syntax is extremely simple. i am not sure what GCC does to make
it complex.

> and
> that there was a way to inline functions written in .S files.
>
> And the feature is way too easy to abuse - same argument here as in
> the threads argument.
> It's a far better thing to not need a semaphore at all than to rely
> on handcoded semaphore down to make your poorly synchronized design
> sort-of perform.
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/

-- 
J o h a n  K u l l s t a m
[kullstam@ne.mediaone.net]
Don't Fear the Penguin!
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/