Re: [TRIVIAL PATCH 2.4.20] madvise_willneed makes bad limit comparison

Andrew Morton (akpm@digeo.com)
Sun, 08 Dec 2002 22:29:41 -0800

Messages sorted by: [ date ][ thread ][ subject ][ author ]
Next message: Andrew Morton: "Re: [PATCH 3/3] High-res-timers part 3 (posix to hrposix) take 20"
Previous message: Andrew Morton: "Re: [patch] kmalloc_percpu -- 2 of 2"
Maybe in reply to: Kingsley Cheung: "[TRIVIAL PATCH 2.4.20] madvise_willneed makes bad limit comparison"
Next in thread: Andrew Morton: "Re: [TRIVIAL PATCH 2.4.20] madvise_willneed makes bad limit comparison"

Kingsley Cheung wrote:
>
> Hi,
>
> 'madvise_willneed' makes an incorrect rss limit comparison. It
> directly compares rlim[RLIMIT_RSS].rlim_cur to rss. The former is in
> bytes, whereas the latter is in pages. The fix for this is trivial.
>

It's surely a bug, but looking at the code, one does ask "what
on earth is it trying to do"?

1) -EIO is not a recognised (or appropriate) return value.

2) If the MADV_WILLNEED call fails, all the user needs to do is to
use a smaller chunk, and walk across the file using that chunk
size! The only system-protecting limit here is the request queue
size.

3) We don't know that the application will try to map all that readahead
at the same time anyway. And if it does, the rlimits will catch it.

Linus used "half the size of the inactive list" in sys_readahead. That's
probably as good as anything else. I'd suggest that we just share
that bit of code in madvise.

hmm. Also the new readahead code will allocate all that memory up-front
before putting it under I/O. I'll fix up do_page_cache_readahead()
for that.

> [As an aside, one question is whether this limit check is needed at
> all. Most rss limit enforcement implementations that I've seen are
> 'soft', whereas this would give the limit 'hard' semantics. Do we
> really want 'hard' limit semantics?]
>

I agree that failing with an error is inappropriate.

We should limit the readahead according to machine size, disk bandwidth,
free memory availability, shoe size, etc. And once that's done then
it _has_ to return success. Otherwise the application would see
different results depending on system size and activity.

It is just "advice".
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Andrew Morton: "Re: [PATCH 3/3] High-res-timers part 3 (posix to hrposix) take 20"
Previous message: Andrew Morton: "Re: [patch] kmalloc_percpu -- 2 of 2"
Maybe in reply to: Kingsley Cheung: "[TRIVIAL PATCH 2.4.20] madvise_willneed makes bad limit comparison"
Next in thread: Andrew Morton: "Re: [TRIVIAL PATCH 2.4.20] madvise_willneed makes bad limit comparison"