Data management I, Autumn 1999, Exercise 4 (18.-21.10.)

The minimum requirement for active attendance: 3 tasks done

Exercise group 4 (Tuesday, 19th Oct, 16-18) will exceptionally be at B450.

On week 44 (Nov 1-5) we have neither lectures nor exercise groups. The first exam will be as indicated: Nov 3rd. The requirements of the exam will be expresses later.


1.  Sketch the structure of the course registration database used at our
department so that you can roughly estimate how much disk space the
registration system needs for registrations of one term.  Suppose that
there are 1500 students, 50 lecture courses with 1-20 exercise groups
each (seminars and lab groups are omitted), 70 final exams, 5000
registrations for courses and 1000 for final exams. 

The system is based on an Oracle database. Could it be a system based on
data stored in the main memory (pros/cons of that)?

2.  Let us implement the EMPLOYEE relation of the Company database as a
heap file.  The size of a disk block is 4 KB.  What kind of an internal
structure would you use for an EMPLOYEE tuple? Estimate the size of the
file (the number of blocks), when there are 10000 employees in all. 

3. a) A file of 80000 blocks is stored optimally on a disk pack having 12
disk surfaces, 2000 tracks per surface and 36 blocks per track.  How
much time is spent scanning through the file when the blocks are read in
the order they are stored on the disk.  The seek time is 1-12 ms (the
average is 6 ms) and block transfer time 0.2 ms. 
 
b) What is the time needed to read all blocks separately (in an
arbitrary order)?

4.  A hash file structure contains B = 4 buckets.  The hash key is
integer-valued and the hash function h is defined by the equation h(v) =
v mod B.  A disk block can hold two records of the file.  Give the
contents of the file structure when records with the keys 9, 10, 11, 31,
32, 41, 42, 43, 44, 45, and 46 (in this order) have first been inserted
into the file and when the records with keys 10, 41, and 42 have then
been deleted from the file. 

5.  Let us implement the WORKS_ON relation of the Company database
as a hash file with ESSN as the hash key. How would you define the hash
function (consider 2 to 3 alternatives). Suppose that the SSN is formed
as in Finland: DDMMYY-123X (day, month, year, date-based counter which
is even for women and uneven for men, check character).

6. What strategy would you use in executing the query

	select ESSN, count(PNO)
	from WORKS_ON
	group by ESSN,

when the structure of the WORKS_ON relation is as in the previous
problem? How many disk accesses are needed? How much main memory
(buffer blocks) is consumed?