i just reproduced the test to validate the data.  i'm using 8kbyte blocks here.
on kernel is 2.4.18, O_DIRECT is still the slowest.
this machine has 2GB RAM, so it has 1.1GB RAM in HighMem.
booting a kernel with 'profile=2' set, the numbers were as follows:
  - Base performance, /dev/md0 raid-0 8-disk array:
         [root@mel-stglab-host1 src]# readprofile -r; 
./test_disk_performance bs=8k blocks=4M /dev/md0
         Completed writing 31250 mbytes in 214. 94761 seconds (153.05 
Mbytes/sec), 53usec mean latency
  - using /dev/md0 raid-0 8-disk array with O_DIRECT:
         [root@mel-stglab-host1 src]# readprofile -r; 
./test_disk_performance bs=8k blocks=4M direct /dev/md0
         Completed reading 31250 mbytes in 1229.830726 seconds (26.64 
Mbytes/sec), 306usec mean latency
  - using /dev/md0 raid-0 8-disk array with O_NOCOPY hack:
         [root@mel-stglab-host1 src]# readprofile -r; 
./test_disk_performance bs=8k blocks=4M nocopy /dev/md0
         Completed writing 31250 mbytes in 163.602116 seconds (200.29 
Mbytes/sec), 39usec mean latency
so O_DIRECT in 2.4.18 still shows up as a 55% performance hit versus no 
O_DIRECT.
anyone have any clues?
from the profile of the O_DIRECT kernel, we have:
         [root@mel-stglab-host1 src]# cat /tmp/profile2.txt | sort -n -k3 | 
tail -20
         c01ceb90 submit_bh                                   270   2.4107
         c01fc8c0 scsi_init_io_vc                             286   0.7772
         c0136ec0 create_bounce                               323   0.9908
         c0139d80 unlock_buffer                               353   4.4125
         c012f7d0 kmem_cache_alloc                            465   1.6146
         c0115a40 __wake_up                                   470   2.4479
         c01fa720 __scsi_end_request                          509   1.7674
         c01fae00 scsi_request_fn                             605   0.7002
         c013cab0 end_buffer_io_kiobuf                        675  10.5469
         c01154e0 schedule                                    849   0.6170
         c0131a40 rmqueue                                     868   1.5069
         c025ede0 raid0_make_request                          871   2.5923
         c0225ee0 qla2x00_done                                973   1.6436
         c013cb60 brw_kiovec                                 1053   1.0446
         c01ce400 __make_request                             1831   1.1110
         c01f30e0 scsi_dispatch_cmd                          1854   2.0692
         c011d010 do_softirq                                 2183   9.7455
         c0136c30 bounce_end_io_read                        13947  39.6222
         c0105230 default_idle                             231472 3616.7500
         00000000 total                                    266665   0.1425
contrast this to the profile where we're not using O_DIRECT:
         [root@mel-stglab-host1 src]# cat /tmp/profile3_base.txt | sort -n 
-k3 | tail -20
         c012fdc0 kmem_cache_reap                             369   0.4707
         c013b330 set_bh_page                                 397   4.9625
         c011d010 do_softirq                                  419   1.8705
         c0131a40 rmqueue                                     466   0.8090
         c01fa720 __scsi_end_request                          484   1.6806
         c012fa60 kmem_cache_free                             496   3.8750
         c013bd00 block_read_full_page                        523   0.7783
         c012f7d0 kmem_cache_alloc                            571   1.9826
         c013db39 _text_lock_buffer                           729   0.9812
         c0130ca0 shrink_cache                                747   0.7781
         c01cea70 generic_make_request                        833   2.8924
         c025ede0 raid0_make_request                          930   2.7679
         c013b280 get_unused_buffer_head                      975   5.5398
         c01fc8c0 scsi_init_io_vc                            1003   2.7255
         c013d490 try_to_free_buffers                        1757   4.7745
         c013a9d0 end_buffer_io_async                        2482  14.1023
         c01ce400 __make_request                             2687   1.6305
         c012a6e0 file_read_actor                            6951  27.1523
         c0105230 default_idle                              15227 237.9219
         00000000 total                                     45048   0.0241
the biggest difference here is bounce_end_io_read in O_DIRECT.
given there's still lots of idle-time, i'll file up lockmeter on here and 
see if theres any gremlins there.
cheers,
lincoln.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/