Database management, Autumn 2000, Exercise 2 (25.-28.9.)

The minimum requirement for active attendance: 3 tasks done

1. A hash file structure contains B = 4 buckets. The hash key is integer-valued and the hash function h is defined by the equation h(v) = v mod B. A disk block can hold three records of the file. Give the contents of the file structure when records with the keys 13, 8, 11, 24, 23, 15, 16, 20, 42, and 4 (in this order) have first been inserted into the file and when the records with keys 8, 15, 16, and 24 have then been deleted from the file.

2. Let us implement the WORKS_ON relation of the Company database as a hash file with ESSN as the hash key. How would you define the hash function (consider 2 to 3 alternatives). Suppose that the SSN is formed as in Finland: DDMMYY-123X (day, month, year, date-based counter which is even for women and uneven for men, check character).

3. What strategy would you use in executing the query

	select ESSN, count(PNO)
	from WORKS_ON
	group by ESSN,
when the structure of the WORKS_ON relation is as in the previous problem? How many disk accesses are needed? How much main memory (buffer blocks) is consumed?

4. Explain how the following SQL statements are executed (in terms of operations on datafile and the two indexes) for a relation emp(ssn, name, salary) given on page 3 of the chapter 3 lecture notes (chapter 3 = luku 3):

a) update emp set name = 'Franck' where name = 'Frank';
b) update emp set name = 'Takala' where name = 'Hakala';
c) update emp set ssn = 9122 where ssn = 1122;

It should be possible to use this example in spite of the Finnish language; there is a sparse (non-dense) index on name (index record: name and page number), and a dense index on ssn (index record: ssn and tuple identifier (tid) containing page number and record number)). A datafile of 7 records is also given.

5. Consider creating a sparse ISAM index (see page 168 in E&N3) for a set of data records with keys given in task 1 (13, 8 etc.). We assume that a disk page (disk block) can hold two data records or three index records. At creation time, we use a 100 % fill-factor, that is, every data page is filled with two records.

a) Show the contents of the structure after it is created. Explain then how the record with key value 23 is deleted, and how records with keys 1, 2, and 3 are inserted.

b) Give an example for a sequence of records inserted into the structure (after the creation phase) with the following property: After the insertions, one additional disk access is needed (on average) when as compared to the structure before these insertions.

6. Let us assume that a file has 1 000 000 records, record length = 200 bytes, and page size 4KB. An ISAM structure is used so that the indexing field is 12 bytes, page number 4 bytes and the index record as a whole 20 bytes (4 bytes for administration). As a basic option for the ISAM structure we have that described on page 168 (E&N3). With these parameters, it requires 50000 pages for the datafile, 250 (50000/200) pages for the level-1 index, 2 (250/2 rounded up) for the level-2 index, and one page for the root (level-3 index).

Estimate the number of pages needed if we change the situation in three ways:

a) the fill-factor in datafile blocks (pages) is 50%,

b) the datafile is implemented as a heap (unordered); then we have a dense ISAM index so that the first level pages are filled up to 50%,

c) as in b (heap ...) but the fill factor is 80 %.

The Finnish lecture notes are available via the Finnish course page ('Luentokalvot') in pdf-format or in a course folder in room A412 (on paper).