Re: Heuristic readahead for filesystems

Richard B. Johnson (root@chaos.analogic.com)
Mon, 16 Sep 2002 08:52:04 -0400 (EDT)


On Thu, 12 Sep 2002, jdow wrote:

> From: "Richard B. Johnson" <root@chaos.analogic.com>
>
> > Then you are tuning a file-system for a single program
> > like `ls`. Most real-world I/O to file-systems are not done
> > by `ls` or even `make`. The extra read-ahead overhead is
> > just that, 'overhead'. Since the cost of disk I/O is expensive,
> > you certainly do not want to read any more than is absolutely
> > necessary. There had been a lot so studies about this in the
> > 70's when disks were very, very, slow. The disk-to-RAM speed
> > ratio hasn't changed much even though both are much faster.
> > Therefore, the conclusions of these studies, made by persons
> > from DEC and IBM, should not have changed. From what I recall,
> > all studies showed that read-ahead always reduced performance,
> > but keeping what was already read in RAM always increased
> > performance.
>
> Dick, those studies are simply not meaningful. The speedup for
> general applications that I generated in the mid 80s with a pair
> of SCSI controllers for the Amiga was rather dramatic. At that
> time every PC controller I ran down was reading 512 bytes per
> transaction. They could not read contiguous sectors unless they
> were VERY fast. For these readahead would generate no benefit.
> (Even some remarkably expensive SCSI controllers for PCs fell
> into that trap and defective mindset.) The controllers I re-
> engineered were capable of reading large blocks of data multiple
> sectors in size in a single transaction. I experimented with
> several programs and discovered that a 16k readahead was about
> my optimum compromise between the read time overhead vs the
> transaction time overhead. I even found that for the average case
> ONE buffer was sufficient, which boggled me. (As a developer I
> was used to reading multiple files at a time to create object
> files and linked targets.)

Well they could read contiguous sectors if the sector interleave
was correctly determined and the correct interleave was set
while low-level formatting. Now-days, interleave is either ignored
or unavailable because there is a sector buffer that can contain
an entire track of data. Some SCSI drives have sector buffers
that can contain a whole cylinder of data.

>
> Back in the 70s did anyone ever read more than a single block
> at a time, other than me that is? (I had readahead in CP/M back
> in the late 70s. But the evidence for that has evaporated I am
> afraid. I repeatedly boggled other CP/M users with my 8" floppy
> speeds as a result. I also added blocking and deblocking so I
> could use large sectors for even more speed.)
>

When we were doing direct-to-disk data writing here at
Analogic around 20 years ago, I experimented with changing
the sector interleave to speed up disc I/O. The DEC disks
in use at that time on our systems had ST-506 interfaces.
I had made a similar utility, called Spin-Ok (a put-down
of Spin-Right who stole my public-domain software and
made a commercial version), for the IBM/PC.

I had to get the privatives from Digital so I could write
a disk formatting driver. Our Chairman Bernie Gordon and
DEC's chairman Ken Olson were not exactly buddies, but we
were allowed to obtain information that might not be given
to everybody.

Some DEC engineers were interested in the results of my experiments.
It turned out that DEC's 1:1 interleave was no where near
correct for the maximum data transfer rate. I don't remember the
numbers, but I think the best speed was with a 5:1 or, perhaps,
3:1. Anyway experiments lead to using DEC file-system disks
as well as raw disks.

The file-system of the day was RMS. It had a fixed-length allocation
map which was owned by a "file" called BITMAP.SYS. This map was
contiguous and each bit in that map represented an allocation unit
on the physical structure. I proposed to DEC engineering that any
access to this file area, result in the entire map being read
into memory. I am quite aware of the "not-invented-here" syndrome
commonplace at Digital. However, one Engineer went to great lengths
presenting data and the results of his group's tests on just that
idea plus the idea of reading ahead, in general. It was shown, at
least to my satisfaction 20 years ago, that any read-ahead was
quite counter productive.

And the numbers were obvious! They didn't require any thought!

Somebody else on this thread asked for a URL! There wasn't any
such thing when this work was going on. We had BBS Systems and
1200 baud modems, eventually up to 9600. However, I did search
for some info. An interesing result is RAPID (Read Ahead for Parallel
Independent Disks)
http://www.unet.univie.ac.at/~a9405327/glossary/node93.html

This was experimental and seems to have been abandoned in the
late 1990s although it provided a lot of meat for several Masters
and PhD thesis.

Cheers,
Dick Johnson
Penguin : Linux version 2.4.18 on an i686 machine (797.90 BogoMips).
The US military has given us many words, FUBAR, SNAFU, now ENRON.
Yes, top management were graduates of West Point and Annapolis.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/