Re: Better fork() (and possbly others) failure diagnostics

jw schultz (jw@pegasys.ws)
Tue, 15 Oct 2002 20:11:45 -0700


On Tue, Oct 15, 2002 at 05:46:21PM +0200, Michal Kara wrote:
> > Take a look at the manpages. It is very clear there that
> > EAGAIN has two meanings: try again because what you request
> > isn't available yet, and request exceeds resource limits (at
> > the moment). Basically POSIX and SUS direct that EAGAIN is
> > the correct error code for resource limit exceedance.
>
> The fork() manpage says:
>
> EAGAIN fork cannot allocate sufficient memory to copy the
> parent's page tables and allocate a task structure
> for the child.
>
> No word about limits. But that may classify as a manpage problem.

I'd say so.

Also i meant that you should do a survey of manpages that
site EAGAIN and not just fork(2). The pattern is clear.

> > I agree it would be nice if rlimit caused its own error code
> > but such a change at this time would break far to many things.
>
> I can think only of some applications retrying when they get EAGAIN...

It is the application that you can't think of that will bite
someone else. Further it isn't just whether they try again.
Some poorly written apps may test errno for known values and
behave oddly if they get an errno that isn't listed in the
manpages. Also it is common to work around limits. Many
apps are written to economize if it gets EAGAIN when
allocating memory.

> > Your alternative of a klogging an error is not appropriate
> > either. Hitting an rlimit is not a system, but a user
> > error.
>
> On workstation or multi-user server yes. But not on, say, web server.
> There hitting the limit is a problem and administrator should do something
> about it. When your nightly processing job hits limit (and when you run it
> by hand, it doesn't) , "Something wrong" is not to much helpful to solve the
> problem.

Which is why your nightly job or server should be logging
its errors from user space.

> But WHICH limit. This is what this is all about. If there was only one,
> then it is OK. And you cannot even display the limit/usage for running
> process to give you a hint.

That is unfortunately a deductive process. You can call
getrlimit and getrusage and try to guess but which one
caused the problem may be, i'll admit, an unknown.
In reality it is seldom that opaque.

Most of the time it is not hard to tell what caused it by
the syscall. For fork it will be RLIMIT_NPROC.

-- 
________________________________________________________________
	J.W. Schultz            Pegasystems Technologies
	email address:		jw@pegasys.ws

Remember Cernan and Schmitt - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/