Re: [PATCH] task_struct + kernel stack colouring ...

Davide Libenzi (davidel@xmailserver.org)
Wed, 5 Dec 2001 11:23:36 -0800 (PST)


On Wed, 5 Dec 2001, Shuji YAMAMURA wrote:

> Hi,
>
> Your patch achieved very good performance in our benchmark test, but
> I would like to say the following two points about your implementation
> for kernel stack colouring.
>
> 1. Your patch could implement coloured kernel stack, but not
> reduce cache line conflicts.
> 2. I couldn't see the performance difference whether stack colouring
> is done or not.
>
> [1] get_stack_jitter() in arch/i386/kernel/process.c selects 3 bits
> from the cache line index bits. So, cache conflicts still occurs at
> shifted line.
>
> Suppose the cache profile is 256KB 4-way, the address distance between
> the data on a some block and the data on the other block on the same
> set is multiple of 64KB. This means, the lower 16 bits of such
> addresses are always same.
> The patch uses 3 bits, from bit position 13 to 15, the data of set
> has always the same colour as describe above. From the viewpoint of
> cache miss reduction this colouring has no effect.
>
> The patch which I have posted before uses 3bits, from bit
> position 18 to 20 (1MB 4-way L2-cache) for task_structs colouring.
>
> I suggest you the following two ways for stack colouring.
>
> (a) Using upper bits than the cache index bits.(ex. On 256KB L2-cache
> system, STACK_SHIFT_BITS should be 16(11 index bits + 5 offset
> bits).
>
> (b) Using modulo operation for colouring.
>
> in get_stack_jitter() (arch/i386/kernel/process.c)
> +#define NUM_COLOUR 9 /* the number of colouring (an odd number) */
> static inline unsigned long get_stack_jitter(struct task_struct *p)
> {
> - return ((TSK_TO_KSTACK(p) >> STACK_SHIFT_BITS) & STACK_COLOUR_MASK) << L1_CACHE_SHIFT;
> + return ((TSK_TO_KSTACK(p) >> STACK_SHIFT_BITS) % NUM_COLOUR) << L1_CACHE_SHIFT;
> }

Whatever bits you take it's a random move with a limited memory address
space and does not change the picture.
Stack colouring become evident when you have a quite big number of
processes waiting for the same event inside the kernel ( accept, ... )
that means that they're going to walk the same path in their way in/out of
the kernel.
By simplifying the pattern, with the current implementation with your
example 256Kb 4 way associative and 8Kb kernel stack you've ( statistically ):

W0 W1 W2 W3

64Kb -------- -------- -------- --------
| | | | | | | |
| | | | | | | |
| | | | | | | |
sp ->| |->| |->| |->| |
| | | | | | | |
| | | | | | | |
| | | | | | | |
56Kb ........ ........ ........ ........
| | | | | | | |
| | | | | | | |
| | | | | | | |
sp ->| |->| |->| |->| |
| | | | | | | |
| | | | | | | |
| | | | | | | |
48Kb ........ ........ ........ ........

! ! ! ! ! ! ! !

8Kb ........ ........ ........ ........
| | | | | | | |
| | | | | | | |
| | | | | | | |
sp ->| |->| |->| |->| |
| | | | | | | |
| | | | | | | |
| | | | | | | |
0Kb ........ ........ ........ ........

By adding three bits of colouring you're going to cut the collision of
about 1/8.

- Davide

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/