RE: [Bench] New benchmark showing fileserver problem in 2.4.12

M. Edward Borasky (znmeb@aracnet.com)
Wed, 17 Oct 2001 08:12:44 -0700
Messages sorted by: [ date ][ thread ][ subject ][ author ]
Next message: John Stoffel: "Re: [Bench] New benchmark showing fileserver problem in 2.4.12"
Previous message: Geert Uytterhoeven: "Re: console_loglevel is broken on ia64"
Have you looked at CPU utilization? Is it abnormally high when the system
slows down?
--
M. Edward (Ed) Borasky, Chief Scientist, Borasky Research
http://www.borasky-research.net
mailto:znmeb@borasky-research.net
http://groups.yahoo.com/group/BoraskyResearchJournal

Q: How do you tell when a pineapple is ready to eat?
A: It picks up its knife and fork.

> -----Original Message-----
> From: linux-kernel-owner@vger.kernel.org
> [mailto:linux-kernel-owner@vger.kernel.org]On Behalf Of Robert Cohen
> Sent: Wednesday, October 17, 2001 6:07 AM
> To: linux-kernel@vger.kernel.org
> Subject: Re: [Bench] New benchmark showing fileserver problem in 2.4.12
>
>
> I have had a chance to do some more testing with the test program I
> posted yesterday. I have been able to try various combinations of
> parameters and variations of the programs.
>
> I now have a pretty good idea of what specific activities will see the
> performance problems I was seeing. But since I'm not a kernel guru, I
> have no idea as to why the problem exists or how to fix it.
>
> I am interested in reports from people who can run the test. I would
> like to confirm my findings (or simply confirm that I'm crazy :-).
>
> The problems appear to only happen in very specific set of
> circumstances. Its an incredible coincidence that my original
> lantest/netatalk testing happened to hit that specific combination of
> factors.
> So it looks like I havent actually found a generic performance problem
> with Linux as such. But I would still like to get to the bottom of this.
>
> The factors that cause these problems probably won't occur very often in
> real usage, but they are things that are not obviously silly. So it does
> indicate a problem with some dark corner of the linux kernel that
> probably should be investigated.
>
> I have identified 4 specific factors that contribute to the problem. All
> 4 have to be present for before there is a performance problem.
>
>
> Summary of the factors involved
> ===============================
>
> Factor 1: the performance problems only occur when you are rewriting an
> existing file in place. That is writing to an existing file which is
> opened without O_TRUNC. Equivalently, if you have written a file and
> then seek'ed back to the beginning and started writing again. I admit
> this is something that not many real programs (except databases) do. But
> it still shouldnt cause any problems.
>
> Factor 2: the performance problems only occur when the part of the file
> you are rewriting is not already present in the page cache. This tends
> to happen when you are doing I/O to files larger than memory. Or if you
> are rewriting an existing file which has just been opened.
>
> Factor 3: the performance problems only happens for I/O that is due to
> network traffic, not I/O that was generated locally. I realise this is
> extremely strange and I have no idea how it knows that I/O is die to
> network traffic let alone why it cares. But I can assure you that it
> does make a difference.
>
> Factor 4: the performance problem is only evident with small writes eg
> write calls with an 8k buffer. Actually, the performance hit is there
> with larger writes, just not significant enough to be an issue. Its
> tempting to say "well just use larger buffers". But this isnt always
> possible and anyway, 8k buffers should still work adequately, just not
> optimally.
>
>
>
> Experimental evidence
> =====================
>
>
> Factor 1: the performance problems only occur when you are rewriting an
> existing file in place. That is writing to an existing file which is
> opened without O_TRUNC. Equivalently, if you have written a file and
> then seek'ed back to the beginning and started writing again.
>
> Evidence: in the report I posted yesterday, the test I was using
> involved 5 clients rewriting 30 Meg files on a 128 Meg machine. The
> symptom  was that after about 10 seconds, the throughput as shown by
> vmstat "bo" drops sharply and we start getting reads occuring as shown
> by the "bi" figure. However, with that test the page cache fills up
> after 10 seconds. This is only just before the end of the files are
> reached and we start rewriting the files. So its difficult to see which
> of those two is causing the problem. Yesterday, I attributed the
> problems to the page cache filling up, but I was apparently wrong.
>
> The new test I am using is 5 copies of
>
> ./send 200 2 | rsh server ./receive 200 2.
>
> Here we have 5 clients rewriting 200 Meg file.
> With this test, the page cache fills up after about 10 seconds, but
> since we are writing a total of 1 Gig of files, the end of the files is
> not reached for 2 minutes or so. It is at this point that we start
> rewriting the files.
>
> When the page cache fills up, there is no drop in performance. However,
> when the end of the file is reached and we start to rewrite, the
> throughput drops and we get the reads happening. So the problems are
> obviously due to the rewriting of an existing file not due to the page
> cache filling.
>
>  It doesnt make any difference whether the test seeks back to the start
> to rewrite or if it closes it and reopens without O_TRUNC.
>
>
>
> Factor 2: the performance problems only occur when the part of the file
> that is being rewritten is not already present in the page cache.
>
> Evidence: I modified the "receive" test program to write to a named file
> and to not delete the file after the run, so I could rewrite existing
> files with only one pass.
>
> On a machine with 128 Megs of memory
>
> I created 5 large test files.
> I purged these files from the page cache by writing another file larger
> than memory and deleting it.
>
> I did a run of 5 copies of ./send 18 1 | rsh server ./receive 18 1
> (each one on a different file).
> I did a second run of ./send 18 1 | rsh server ./receive 18 1
>
> With the first run, the files were not present in page cache and the
> performance problems were seen. This run took about 95 seconds. Since
> the total size of the 5 files is smaller than page cache available, they
> were all still present after the first run.
>
> The second run took about 20 seconds. So the presence of data in the
> cache makes a significant difference.
>
> It seems natural to say "of course the cache sped things up, thats what
> caches are for". However, the cache should not have sped this operation
> up. Only writes are being done, no reads. So there is no reason why the
> presence of data in the cache which is going to be overwritten anyway
> should speed things up.
> Also, the cache shouldnt speed writes up since the program does an fsync
> to flush the cache on write. And even if the cache does speed writes, it
> should have the same effect on both runs.
>
> I had originally thought the problem occured when the page cache was
> full. I assumed it was due to the extra work to purge something from the
> page cache to make space for a new write. However with this test I
> observed that the performance was bad even when the page cache did not
> fill memory and there was plenty of free memory. So it seems that the
> performance problem is purely due to rewriting something which is not
> present in page cache. It has nothing to do with the amount of free
> memory and whether the page cache is filling memory.
>
> In this kind of test, if the collective size of the files is greater
> than the amount of memory available for page cache, then problems can be
> observed even with the second run. For example if you are writing to 120
> Megs of files and there is 100 Megs of page cache. On the second run,
> even though 100 megs of the files are present in the page cache, you get
> no benefit because each portion of the file will be flushed to make way
> for new writes before you get around to rewriting that portion. This is
> the standard LRU performance wall when the working set is bigger than
> available memory.
>
>
>
> Factor 3: the problems only happens for I/O that is due to network
> traffic.
> Evidence: The problem does occurs when you have a second machine
> "rsh"ing into the linux server.
> However, if you run the test entirely on the linux server with any of
> the following
>
> ./send 30 10 | ./receive 30 10
> ./send 30 10 | rsh localhost ./receive 30 10
> ./send 30 10 | rsh server ./receive 30 10
>
> then the problem does not occur. Strangely we also don't see any reads
> showing up in the vmstat output in these cases.
> It seems the page cache is able to rewrite existing files without doing
> any reading first under these conditions.
>
> This is the really strange issue. I have no idea why it would make a
> difference whether the receive program is taking its standard input from
> a local source or from an rsh over the network. Why would the behaviour
> of the page cache differ in these circumstances. If any Guru's can clue
> me in, I would appreciate it.
>
>
>
> Factor 4: the performance problem only occurs with small writes.
> Evidence: the  test programs I posted yesterday were doing IO with 8K
> buffers (set by a define) because that was what the original benchmark I
> was emulating did. If I modify "receive" to use a 64k buffer, I get
> adequate throughput.
> The anomalous reads are still happening, but don't seem to impact
> performance too much. The throughput ramps smoothly between 8k and 64k
> buffers.
>
> One possible response is a variation on the old joke: if you have
> experience problems when you do 8k writes, then don't do 8k writes.
> However, I would like to understand why we are seeing a problem with 8k
> writes. Its not as if 8k is *that* small. At worst small writes should
> just chew CPU time, but we get lots of CPU idle time during the
> benchmark, just poor throughput. The evidence suggests some kind of
> constant overhead for each write.
>
> Modifying the buffer size in send, simply reduces the amount of CPU that
> send uses. Which is as you would expect. Doing this doesnt have much
> effect on the overall throughput.
>
>
> --
> Robert Cohen
> Unix Support
> TLTSU
> Australian National University
> Ph: 612 58389
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>
>

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Next message: John Stoffel: "Re: [Bench] New benchmark showing fileserver problem in 2.4.12"
Previous message: Geert Uytterhoeven: "Re: console_loglevel is broken on ia64"