Re: objrmap and vmtruncate

Andrea Arcangeli (andrea@suse.de)
Sun, 6 Apr 2003 02:14:07 +0200


On Sat, Apr 05, 2003 at 03:57:40PM -0800, Andrew Morton wrote:
> William Lee Irwin III <wli@holomorphy.com> wrote:
> > On Sat, Apr 05, 2003 at 04:06:14AM -0800, Andrew Morton wrote:
> > > In the third test a single task owns 10000 VMA's and walks across them in a
> > > linear pattern:
> > > ./rmap-test -v -l -i 10 -n 10000 -s 7 -t 1 foo
> > > 2.5.66-mm4:
> > > 0.25s user 3.75s system 1% cpu 4:38.44 total
> > > 2.5.66-mm4+objrmap:
> > > 0.28s user 146.45s system 16% cpu 15:14.59 total
> > > 2.4.21-pre5aa2:
> > > 0.32s user 4.83s system 0% cpu 18:25.90 total
> >
> > This doesn't appear to be the kind of issue that would be addressed by
> > the more advanced search structure to replace ->i_mmap and ->i_mmap_shared.
>
> We have 10000 disjoint VMA's and we want to find the one which maps this
> page. If we cannot solve this then we have a problem.
>
> > I'm somewhat surprised the virtualscan does so poorly; from an a priori
> > POV with low sharing and linear access there's no obvious reason in my
> > mind why it would do as poorly as or worse than the objrmap here.
>
> The virtual scan did well in all tests I _think_. What happened in this test
> is that the IO scheduling was crap - the disk sounded like a dentist's drill.
>
> Could be that this is due to the elevator changes which Andrea has made, or

2.4-aa is outperforming 2.5 in almost all tiobenchs results, so I doubt
the elevator is that bad and could explain such drop in performance.

I suspect it must be something on the lines of the filesystem doing
synchronous I/O for some reason inside writepage, like doing a
wait_on_buffer for every writepage, generating the above fake results.
Note the 0% cpu time. You're not benchmarking the vm here. Infact I
would be interested to see the above repeated on ext2.

It's not true that ext3 is sharing the same writepage of ext2 as you
said in a earlier email, the ext3 writepage starts like this:

static int ext3_writepage(struct page *page)
{
struct inode *inode = page->mapping->host;
struct buffer_head *page_buffers;
handle_t *handle = NULL;
int ret = 0, err;
int needed;
int order_data;

J_ASSERT(PageLocked(page));

/*
* We give up here if we're reentered, because it might be
* for a different filesystem. One *could* look for a
* nested transaction opportunity.
*/
lock_kernel();
if (ext3_journal_current_handle())
goto out_fail;

needed = ext3_writepage_trans_blocks(inode);
if (current->flags & PF_MEMALLOC)
handle = ext3_journal_try_start(inode, needed);
else
handle = ext3_journal_start(inode, needed);

and even the ext2 writepage can be synchronous if it has to call
get_block. Infact I would reccomend to fill the "foo" file with zeros
and not to have holes in it just to avoid additional synchronous fs
overhead and to only be sync in the inode map lookup.

Andrea
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/