. -mm3 and -mm4 were not announced - they were sync-up patches as we
worked on the I/O scheduler.
. -mm5 has the first cut of Nick Piggin's anticipatory I/O scheduler.
Here's the scoop:
The problem being addressed here is (mainly) kernel behaviour when there
is a stream of writeout happening, and someone submits a read.
In 2.4.x, the disk queues contain up to 30 megabytes of writes (say, one
seconds's worth). When a read is submitted the 2.4 I/O scheduler will try
to insert that at the right place between the writes. Usually, there is no
right place and the read is appended to the queue. That is: it will be
serviced in one second.
But the problem with reads is that they are dependent - neither the
application nor the kernel can submit read #N until read #N-1 has
completed. So something as simple as
cat /usr/src/linux/kernel/*.c > /dev/null
requires several hundred dependent reads. And in the presence of a
streaming write, each and every one of those reads gets stuck at the end of
the queue, and takes a second to propagate to the head. The `cat' takes
hundreds of seconds.
The celebrated read-latency2 patch recognises the fact that appending a
read to a tail of writes is dumb, and puts the read near the head of the
queue of writes. It provides an improvement of up to 30x. The deadline
I/O scheduler in 2.5 does the same thing: if reads are queued up, promote
them past writes, even if those writes have been waiting longer.
So far so good, but these fixes are still dumb. Because we're solving
the dependent read problem by creating a seek storm. Every time someone
submits a read, we stop writing, seek over and service the read, and then
*immediately* seek back and start servicing writes again.
But in the common case, the application which submitted a read is about
to go and submit another one, closeby on-disk to the first. So whoops, we
have to seek back to service that one as well.
So what anticipatory scheduling does is very simple: if an application
has performed a read, do *nothing at all* for a few milliseconds. Just
return to userspace (or to the filesystem) in the expectation that the
application or filesystem will quickly submit another read which is
closeby.
If the application _does_ submit the read then fine - we service that
quickly. If it does not submit a read then we lose. Time out and go back
to doing writes.
The end result is a large reduction in seeking - decreased read latency,
increased read bandwidth and increased write bandwidth.
The code as-is has rough spots and still needs quite some work. But it
appears to be stable. The test which I have concentrated on is "how long
does my laptop take to compile util-linux when there is a continuous write
happening". On ext2, mounted noatime:
2.4.20: 538 seconds
2.5.59: 400 seconds
2.5.59-mm5: 70 seconds
No streaming write: 48 seconds
A couple of VFS changes were needed as well.
More details on anticipatory scheduling may be found at
http://www.cs.rice.edu/~ssiyer/r/antsched/
Changes since 2.5.59-mm2:
+preempt-locking.patch
Speed up the smp preempt locking.
+ext2-allocation-failure-fix.patch
ext2 ENOSPC crash fix
+ext2_new_block-fixes.patch
ext2 cleanups
+hangcheck-timer.patch
A form of software watchdog
+slab-irq-fix.patch
Fix a BUG() in slab when memory exhaustion happens at a bad time.
+sendfile-security-hooks.patch
Reinstate lost security hooks around sendfile()
+buffer-io-accounting.patch
Fix IO-wait acounting
+aic79xx-linux-2.5.59-20030122.patch
aic7xxx driver update
+topology-remove-underbars.patch
cleanup
+mandlock-oops-fix.patch
file locking fix
+reiserfs_file_write.patch
reworked reiserfs write code.
-exit_mmap-fix2.patch
Dropped
+generic_file_readonly_mmap-fix.patch
Fix MAP_PRIVATE mmaps for filesystems which don't support ->writepage()
+seq_file-page-defn.patch
Compile fix
+exit_mmap-fix-ppc64.patch
+exit_mmap-ia64-fix.patch
Fix the exit_mmap() problem in arch code.
+show_task-fix.patch
Fix oops in show_task()
+scsi-iothread.patch
software suspend fix
+numaq-ioapic-fix2.patch
NUMAQ stuff
+misc.patch
Random fixes
+writeback-sync-cleanup.patch
remove some junk from fs-writeback.c
+dont-wait-on-inode.patch
Fix large delays in the writeback path
+unlink-latency-fix.patch
Fix large delays in unlink()
+anticipatory_io_scheduling-2_5_59-mm3.patch
Anticipatory scheduling implementation
All 65 patches:
kgdb.patch
devfs-fix.patch
deadline-np-42.patch
(undescribed patch)
deadline-np-43.patch
(undescribed patch)
setuid-exec-no-lock_kernel.patch
remove lock_kernel() from exec of setuid apps
buffer-debug.patch
buffer.c debugging
warn-null-wakeup.patch
reiserfs-readpages.patch
reiserfs v3 readpages support
fadvise.patch
implement posix_fadvise64()
ext3-scheduling-storm.patch
ext3: fix scheduling storm and lockups
auto-unplug.patch
self-unplugging request queues
less-unplugging.patch
Remove most of the blk_run_queues() calls
lockless-current_kernel_time.patch
Lockless current_kernel_timer()
scheduler-tunables.patch
scheduler tunables
htlb-2.patch
hugetlb: fix MAP_FIXED handling
kirq.patch
kirq-up-fix.patch
Subject: Re: 2.5.59-mm1
ext3-truncate-ordered-pages.patch
ext3: explicitly free truncated pages
prune-icache-stats.patch
add stats for page reclaim via inode freeing
vma-file-merge.patch
mmap-whitespace.patch
read_cache_pages-cleanup.patch
cleanup in read_cache_pages()
remove-GFP_HIGHIO.patch
remove __GFP_HIGHIO
quota-lockfix.patch
quota locking fix
quota-offsem.patch
quota semaphore fix
oprofile-p4.patch
oprofile_cpu-as-string.patch
oprofile cpu-as-string
preempt-locking.patch
Subject: spinlock efficiency problem [was 2.5.57 IO slowdown with CONFIG_PREEMPT enabled)
wli-11_pgd_ctor.patch
(undescribed patch)
wli-11_pgd_ctor-update.patch
pgd_ctor update
stack-overflow-fix.patch
stack overflow checking fix
ext2-allocation-failure-fix.patch
Subject: [PATCH] ext2 allocation failures
ext2_new_block-fixes.patch
ext2_new_block cleanups and fixes
hangcheck-timer.patch
hangcheck-timer
slab-irq-fix.patch
slab IRQ fix
Richard_Henderson_for_President.patch
Subject: [PATCH] Richard Henderson for President!
parenthesise-pgd_index.patch
Subject: i386 pgd_index() doesn't parenthesize its arg
sendfile-security-hooks.patch
Subject: [RFC][PATCH] Restore LSM hook calls to sendfile
macro-double-eval-fix.patch
Subject: Re: i386 pgd_index() doesn't parenthesize its arg
mmzone-parens.patch
asm-i386/mmzone.h macro paren/eval fixes
blkdev-fixes.patch
blkdev.h fixes
remove-will_become_orphaned_pgrp.patch
remove will_become_orphaned_pgrp()
buffer-io-accounting.patch
correct wait accounting in wait_on_buffer()
aic79xx-linux-2.5.59-20030122.patch
aic7xxx update
MAX_IO_APICS-ifdef.patch
MAX_IO_APICS #ifdef'd wrongly
dac960-error-retry.patch
Subject: [PATCH] linux2.5.56 patch to DAC960 driver for error retry
topology-remove-underbars.patch
Remove __ from topology macros
mandlock-oops-fix.patch
ftruncate/truncate oopses with mandatory locking
put_user-warning-fix.patch
Subject: Re: Linux 2.5.59
reiserfs_file_write.patch
Subject: reiserfs file_write patch
vmlinux-fix.patch
vmlinux fix
smalldevfs.patch
smalldevfs
sound-firmware-load-fix.patch
soundcore.c referenced non-existent errno variable
generic_file_readonly_mmap-fix.patch
Fix generic_file_readonly_mmap()
seq_file-page-defn.patch
Include <asm/page.h> in fs/seq_file.c, as it uses PAGE_SIZE
exit_mmap-fix-ppc64.patch
exit_mmap-ia64-fix.patch
Fix ia64's 64bit->32bit app switching
show_task-fix.patch
Subject: [PATCH] 2.5.59: show_task() oops
scsi-iothread.patch
scsi_eh_* needs to run even during suspend
numaq-ioapic-fix2.patch
NUMAQ io_apic programming fix
misc.patch
misc fixes
writeback-sync-cleanup.patch
dont-wait-on-inode.patch
unlink-latency-fix.patch
anticipatory_io_scheduling-2_5_59-mm3.patch
Subject: [PATCH] 2.5.59-mm3 antic io sched
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/