> I recently had the following problem: Roxen (a webserver that uses
> threads) was running out of control, eating up more and more memory.
> (btw, I'd rather run Apache but it's not my decision to make).
> The oom killer kicked in and started killing roxen processes.
> Apparently it didn't succeed in killing _all_ threads. So it didn't
> help at all, and the machine had to be rebooted.
this would be a bug in roxen. for the corresponding code in apache, and
our attempt to be robust in the face of such activity take a look at the
sleep(10) in make_child(). basically apache limits how much it will spawn
each second, and if it gets any fork error the parent will cease all
activity for 10 seconds. roxen needs code to limit its own damage, i
don't see why the kernel should do it...
> Wouldn't it be a good idea to kill all processes that have the same
> ->mm as the process that was selected to be killed? The patch below
> implements it. I've tested it and it seems to work nicely.
i think it's a good idea but not for the reason you give... it's a good
idea because a multithreaded process using userland mutexes will have an
unpredictable number of locked mutexes in each thread -- and killing a
thread could result in hangs in the remaining threads as they wait for
mutexes which will never be freed. (threads are kind of evil ;)
hey -- is there a way to know when a task was OOM killed as opposed to
other forms of death?
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to firstname.lastname@example.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/