Ahh, only if you do my optimization of sharing trampolines among users.
And you're right, that won't work.
But if you don't do that, you don't need a stack frame. You just reload GP
and jump back to the caller.
And assuming most calls don't need the trampoline (and hey, they really
shouldn't), you're still way ahead. The only thing you lost was the icache
win of re-using the trampoline (and a few cycles for scheduling and the
extra short branch).
Think of it as nothing more than a branch prediction thing - you predict
that you can take a short branch, and emit the long-branch code
So the code would be roughly (this is not how the compiler would see it,
this is the very last stage of outputting the actual assembly. Nothing
else needs to know):
bsr $26,trampoline // linker overflow case
ldq $27,fn($gp) // load the full address
jsr $26,($27) // branch to it
ldgp $29,($26) // reload our GP
jsr $31,retpoint // and go back to where we came from.
And the linker can just use the special .rel20 thing to turn the bsr into
a direct call when it can.
Overhead when it cannot: one extra "bsr", one extra "jsr" back, and the
lack of scheduling. You lost two cycles and maybe a pipeline stall or
something (branching around is never nice, even if it's unconditional).
But you only lose this on misprects. And you can have a pretty high
prediction accuracy, even with just static knowledge.
And when you _do_ predict right, you're going to win in icache footprint,
code size (and because you can drop the trampoline for non-weak symbolds
the executable size also goes down) and cycles.
That still doesn't look complicated to me. Of course, it clearly does
depend on whether I'm right that you can fairly easily get 99% prediction
accuracy. And I could just be full of sh*t.
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to email@example.com
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/