> On Mon, 26 Jun 2000, Rik van Riel wrote:
>
> >Actually that's not the case. You're forgetting that
> >kswapd _frees_ memory...
>
> So you want to do a very different thing: you want to wakeup kflushd each
> time you free a page in 2.4.x or each time you succeed in freeing a buffer
> in 2.2.x. So you want to do check the balance_dirty thing in the success
> path (not in the fail path!!).
>
> I think it would be a little overkill, we just wakeup kswapd as soon as we
> find a busy buffer, that should be enough in practice (even more in 2.4.x
> where we only walk on the cache thanks to the lru).
>
> BTW, the below thing that is been added here in 17pre6:
>
> /*
> * Wait for async IO to complete
> * at each 64 buffers
> */
>
> int wait = ((gfp_mask & __GFP_IO)
> && (!(nr_dirty++ % 64)));
>
> doesn't make sense to me.
>
> As first the comment is wrong since the __GFP_IO check shortcut the
> nr_dirty++ sometime (I guess that was not wanted). Then "64" magic
> value is just a random value.
Yes, the comment is wrong.
Do you suggest anything better than a magic value?
> Then suppose there's 1 locked buffer in all the VM and suppose there are
> lots of clean and freeable buffers (more than 64 buffers at least). Why
> the heck should we wait I/O completation for such async buffer generated
> previously by `cp` (where not even `cp` is waiting synchronously for it
> because it was a write) (maybe it's also getting written to a sloww
> floppy) while I do a little malloc?
We can wait on IO if try_to_free_pages priority is < 1 &&
balance_dirty_state() <= 0.
If we are under such priority, we already scanned a lot of pages and found
no sufficient amount of freeable ones, which means we should wait on IO if
the memory is partly dirty.
Also with your balance_dirty_state patch, kflushd is woken up correctly,
which means processes will have to wait on IO themselves more rarely.
> I see we may have at some point to wait for I/O completion if all the VM
> happens to be somehow dirty and I have ideas on how doing it with care
> (and no, not as 2.4.x either, and btw in 2.4.x there's the buggy shortcut
> side effect too).
>
> But before I even go to implement the above we have first to account the
> dirty pages in the MAP_SHARED segments as __dirty__. That is __strictly__
> necessary for allowing the machine to allocate without blocking while
> there's heavy I/O pressure and little non-write-I/O related memory
> allocation pressure. And fixing that is probaly going also solve all the
> oom problems reported (even if I see that to be completly correct we
> should also be able to wait for dirty memory objects to return clean but
> that's another issue).
>
> We must solve that _first_ and _right_ for the MAP_SHARED segment too
> (and the way to solve that is _completly_ different to whatever I seen
> floating around so far). Incidentally solving that isn't that trivial
> since it involves the page fault path (->no_page and do_wp_page at
> least) plus changes in the way we collect the map_shared dirty pages,
> and and I believe it should be fixed in 2.4.x only as first.
>
> For 2.2.x I still suggest all the patches I listed in my earlier email to
> the list. Too large MAP_SHARED (aka mmap002) will still not be reliable
> (you will need to buy more RAM) but there won't be side effects against
> the cases that just works right.
Innocent processes (e.g. syslogd, klogd, bash) are getting killed
with this patches.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.rutgers.edu
Please read the FAQ at http://www.tux.org/lkml/