The recent CERT IN-2001-01 's massive repercussions and CA-2001-02's
old material in an attempt to coerce admins to update their OS, has led
me to think about
buffer overrun exploits. I have gained a new appreciation after being
rooted twice this month.
I believe there is a solution that can be implemented in the kernel
(Linux and probably most Unix)
that can prevent this type of exploit, has no effect on userspace code,
and is minimally obtrusive
for the kernel.
Making a few assumptions here - I'm writing to confirm or deny this
The virtual address space of a Linux process starts at a low address
(0?) with a block
-the executable's code & constant data mmaped read-only from the
-the executable's static initialized data mmapped copy-on-write from
-more of each of the above, but for shared libraries.
each continuous address range from the above is described in a kernel
and is mapped on demand and placed into hardware page-protection perms
(rwx) by the CPU's
PMMU hardware and the kernel's fault-handler's.
Next, there is a variable ammount of un mapped memory, Followed by the
The stack's vm_area grows downward, unlike the others ( brk() call) and
begins at the high
address at the top of user space, which varies but is 3GB for a 1GB max
beyond this there is no vm_area's, and the page tables contain mappings
which are marked
supervisor-only (is this right?), and definitely don't contain user
Next, gcc doesn't generate any code which would be placed in the stack,
nor does it
generate any calls/jumps to the stack area.
Next, buffer overruns are the only source of code whch would execute
from the stack, and
from what I understand, remote (if not all) buffer overruns depend on
this to "work".
Solution: if the kernel sets up the CPU's memory management unit (PMMU)
so that it won't
execute code in the stack address space, the exploits are foiled.
Problem: on intel, the page tables page permissions are not flexible
enough, so when a page
is marked (for userspace) read-write permissions, execute permission is
But, intel also has segment descriptors held in the GDT/LDTs, which
configure a base address
and range, and a different one can be selected for each segment register
of a process. Under the current
Linux the code segment (CS) has a descriptor from the GDT which allows
code to be executed read-only from
base address 0 with a range of 4G (i.e. the entire linear address
space), and the data segment
allows read-write but not execute (can't be loaded into CS register).
SO, if the CS descriptor were changed by the kernel to track the bottom
of the stack (lower in memory),
then any attempt to execute code on the stack would segfault (or another
signal to help track exploit
attempts) It could get the bottom page address from the vm_area_struct
for the stack (are there more than one GROWS_DOWN
vm areas in a process?)
Currently the CS for all linux programs gets it's descriptor from GDT,
so it would have to be manually
changed at each task-swap, and perhaps there are segment descriptor and
other cache flushing issues,
(maybe just store CS limit in the per-task data structure, and update
GDT then reload CS at each
I realize that the GDT/LDT must be accessible, and that they are in
kernel space (above 3GB), but I don't
think these go through CS register access controls. The DS segment can
be left alone.
For other arch's, maybe they have separate read/write/execute perms per
page, or something similar
to segment descriptors.
I would appreciate thoughful comments; anybody who knows why it won't
work, tell me,
I haven't got my hopes up for the Nobel prize yet :)
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to email@example.com
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/