Helsingin yliopisto

University of Helsinki, Depaertment of Computer Science
Database Management, Practice Session 3, Spring 2003

1. The index key of a multi-column index is build up by catenating the values (as padded to the maximum length) of the columns listed in the definition of the index. Ponder the need of building many indexes based on the same columns but in different order, for example, A,B,C and B,C,A. Would it be useful? What about a column collection and its subset, for example, A,B,C and A,B or B,C. Consider different types of indexes.

2. Table registration was used in task 3 of Practice Session 2. A new row is insertéd in the table when a student registers for some course. The key of the table contains the columns course_code (about 9 bytes), term (1 byte), year (4 bytes), course number (2 bytes), practice group number (2 tavua) ja student number (11 tavua). A row contains also the name of a student (about 25 bytes) and in average 300 bytes of other information. When a row is inserted we must check that the student has not already registered for the course on that term. We must also check how many times the student has taken the course in previous terms. Searching current term registrations by student number is a common operation. Listing of all students registered for a course and all students registered for a group are commonly needed. Student's registration record for a given course is also commonly serched by the initial part of student's name.

There are about 200000 registrations, alltogether, and about 60 courses in a year. A student has about 3 registrations in a term, in average. Let's assume that the page address used in the index takes 10 bytes. In addition the index entry contains only the key. Page size is 4 KB. The basic file structure is heap. Dense indexes structured as ISAM files may be build on top of the heap structure.

What indexes should be build? Why? Consider different key combinations.

3. Estimate the sizes of the indexes you specified in task 2. Estimate also the number of disk accesses for the operations. You may assume that the fill factor of the data pages of the ISAM structure is 70% and there are overflows in about 5% of pages.

4. Consider a B+ -tree of degree 2 (all index nodes have 2 to 4 keys, and all data pages have 2 to 4 records). A sequece of records is inserted to the tree. The keys of the records are: 23, 65, 37, 60, 46, 92, 48, 71, 56, 59, 18, 21, 10, 74, 78, 15, 16, 20 and 24.

a) Draw the result tree and its construction step by step.
b) How does the tree change when keys 24, 23, 10 and 20 are removed?

5. Table student has columns personal_number (11 c), student_number (15 c), last_name, first_name, previous_name, e_mail_address, phone (15 c), major (3 c), last_update_time (7 c). Names and e-mail address are defined as varchar(80), but their average lengths are last name (9), first name (12), previous name (2 - most are empty), email address (20). The length of a page address is 10 bytes. Consider the need of disk space and the number of accesses in the following cases:

a) The table is stored as a B+ -tree with student number as a key
b) The table is stored as a heap and on top of it is has a dense index stored as a B+ -tree with student number as a key.
c) On top of the above structures there is a sexondary index on personal number (in solution (a) is is indirect i.e. contains the primary keys instead of page addresses)