Re: 2.5.59mm5, raid1 resync speed regression.

Mitchell Blank Jr (mitch@sfgoth.com)
Fri, 24 Jan 2003 14:40:45 -0800

Messages sorted by: [ date ][ thread ][ subject ][ author ]
Next message: Ivan Kokshaysky: "Re: [patch 2.5] tg3.c: pci_{save,restore}_extended_state"
Previous message: David Mansfield: "2.5.59mm5 database 'benchmark' results"

David Mansfield wrote:
> decide which half of the mirror is more current. Read blocks from this
> partition, write to other. Periodically update raid-superblock or
> something.

Well I haven't looked at the code (or the academic paper it's based on) but
this makes some sense based on Andrew's description of the new algorithm.
It sounds like the RAID resync is reading/writing the same block and
(I'm theorizing here) is doing the write back synchronously so it gets
delayed by the anticipatory post-read delay. Does this sound possible?

(Or does it just blindly write over the entire "old" mirror in which case I
don't know how that would affect RAID resync. I think this scenario might
be possible under at least some workloads though, read on...)

One idea (assuming it doesn't to it already) would be to cancel the post-read
delay if we get a synchronous (or maybe any) write for the same (or very near)
block. The rationale would be that it's likely from the same application and
the short seek will be cheap anyway (there's a high probability the drive's
track-buffer will take care of it).

Questions:

* Has anyone tested what happens when an application is alternately doing
reads and sync-writes to the same large file/partition? (I'm thinking
about databases here) It sounds like an algorithm like this could slow
them down.

* What are the current heuristics used to prevent write-starvation? Do
they need to be tuned now that reads are getting even more of an
advantage?

* Has anyone looked at other "hints" that the higher levels can give the
I/O scheduler that would indicate that a post-read delay is not
likely to be fruitful (like from syscalls like close() or exit())
Obviously the trick of communicating these all the way down to the
I/O scheduler would be tricky but it might be worth at least thinking
about.

Also, would it be possible to get profile data about these post-read delays?
Specifically it would be good to know:

1. How many expired without another nearby read happening
1a. How many of those had other I/O waiting to go when they expired
2. How many were cut short by another nearby read (i.e. a "success")
3. How many were cut short by some other heuristic (like described above)

That way we could see how much these delays are helping or hurting various
workloads.

Sorry if any of this is obvious or already implemented - they're just my
first thoughts after reading the announcement. Sounds like really
interesting work though.

-Mitch
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Next message: Ivan Kokshaysky: "Re: [patch 2.5] tg3.c: pci_{save,restore}_extended_state"
Previous message: David Mansfield: "2.5.59mm5 database 'benchmark' results"