write-caches, I/O stalls: MUST-FIX (was: [PATCH] io stalls)

Matthias Andree (matthias.andree@gmx.de)
Fri, 27 Jun 2003 10:41:25 +0200


On Wed, 25 Jun 2003, Chris Mason wrote:

> I've no preference really. I didn't notice a throughput difference but
> my scsi drives only have 2MB of cache.

You shouldn't be using the drive's write cache in the first place!

The write cache, regardless of ATA or SCSI, can, as far as I know, not
be used safely with any Linux file systems (and my questions whether 2.6
will finally change that went unanswered so far), because the write
reordering the write cache can do can seriously damage file systems,
whether journalling or not.

Please conduct all your tests with write caches turned off because
that's what matters in REAL systems; in that case, these latencies
become a REAL pain in the back because writing is so much slower because
of all the seeks.

Optimizing for write cached behaviour can happen not a single second
before:

1. the file systems know how to queue "ordered tags" in the right places
(write barrier to enforce proper ordering for on-disk consistency,
which I assume will make for a lot of ordered tags for writing to the
journal itself)

2. the device driver knows how to map "ordered tags" to flush or
whatever operations for drives that don't do tagged command queueing
(ATA mostly, or SCSI when TCQ is switched off).

All these "0-bytes in file" problems with XFS, ReiserFS, JFS, ext2 and
ext3 in data=writeback mode happen because the kernel doesn't care about
write ordering, and these broken files are a) occasionally hard to find,
b) another PITA.

I consider proper write ordering and enforcing thereof a MUST-FIX. This
is much more important than getting some extra latencies squished. It
must do the right thing in the first place, and then it can do the right
thing faster.

I am aware that you're not the only person responsible for the state
Linux is in, and I'd like to see the write barriers revived ASAP for at
least ext2/ext3/reiserfs, sym53c8xx, aic7xxx, tmscsim and IDE. I am
sorry not being able to offer any help on that, I'm not acquainted with
the kernel stuff and I can't donate money to anyone for me to do it.

SCNR.

-- 
Matthias Andree
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/