Helsingin yliopisto

University of Helsinki Database Management, Practice Session 2, Spring 2003

The database server bodbacka.cs.helsinki.fi stores the database in a 181GB disk unit that has the average random access time of 7.4 ms. The rotation speed of the unit is 7200 rpm (a single rotation takes 8.3 ms). There are 24 surfaces. The database is stored as 4 KB blocks. A track contains 90 blocks, in average. There are about 24000 cylinders. Use this disk for the following calculations.

Table student has columns personal_number (11 c), student_number (15 c), last_name, first_name, previous_name, e_mail_address, phone (15 c), major (3 c), last_update_time (7 c). Names and e-mail address are defined as varchar(80), but their average lengths are last name (9), first name (12), previous name (2 - most are empty), email address (20).
Records are stored as follows: Record header 32 bytes contains record identifier, record length, number of columns, etc, Each column is stored as the length of the value + the actual value in the number of bytes as defined in the length. The length is stored in a single byte, if it is less or equal to 250, and in three bytes if it is greater that 250 (value 255 in the first byte indicates that the two following bytes contain the length).
1. What's the average physical length of a student record?
2. Ponder how the student records behave in updates. Consider the records to be stored as a heap file. How much free space should be left in each block to avoid overflows and chained blocks.
Outline the algorithm to fetch the value of the fifth column of a record when the storage technique of task 1 is used.
Let's assume that each page contains 128 bytes of control data in addition to the records. The student table of task 1 is stored as a heap. There are 16000 students.
1. What's the average search time for a random student by student number.
2. What's the average search time for all computer science students (about 20% of all).
3. What would be the average binary search time for a student by student number if the file were stored as an ordered sequential file by student number?
The key of table registration contains the columns: course_code, term, year, course number that identifies the instances of a course lectured more than one in a term, practice group number and student number. This table stores information of students' registrations on courses. It has data about many terms. Ponder the problems on maintaining the table if it is stored as an ordered sequential file by the key. Consider different orders of key columns as the ordering key. Which order is most problematic and which is least problematic.
It is not worth building an index for a file on some column or column combination, if this index does not cause on essential reduction of access time for some commonly used query. Consider the heap structured student table
1. On wich columns or column combinations there should be an index?
2. Which basic file structures would you use in structuring the indexes (heap, ordered sequential, or hash).
3. How much additional space is needed for the indexes. Let's assume that there is only 6 bytes control data in each index record and a disk address is also 6 bytes.