Database management, Autumn 2001, Exercise 1 (18.-21.9.)

The minimum requirement for active attendance: 2 tasks done
Tasks marked as (**) are counted as two tasks.

The purpose of tasks 1 and 2 is simply to repeat some basic features of the SQL query language. For SQL, see the textbook (E&N, Ch. 8), Oracle manual etc. The SQL exercises concern the database described in E&N, page 205.

1. Explain the meaning of the following SQL query.

select x.LNAME, y.LNAME
from EMPLOYEE x, EMPLOYEE y
where x.SUPERSSN = y.SSN
union select x.LNAME, z.LNAME
from EMPLOYEE x, EMPLOYEE y, EMPLOYEE z
where x.SUPERSSN = y.SSN and y.SUPERSSN = z.SSN.
2. Write SQL queries for printing
a) the names and addresses of the employees working for some project of Administration department at least 10 hours per week;
b) employees with no children;
c) the number of employees for every project; projects in alphabetical order.

3. (**) Sketch the structure of the database for students´ studies (grades, registrations) so that you can roughly estimate how much disk space is needed. Suppose that there are 50000 students, 10000 lecture courses, 40 grades (for courses passed) for an average student. A lecture course can be passed 5 times/year. One specific course (a course given at a specific time) has 1-20 exercise groups (with an average of 4. An average student has 10 registrations per year (either for a lecture course or for an exam). The database contains grades of 30 years. Registrations will be stored only for one year.

The system uses a disk based database (Oracle). Would it be possible to construct the system based on data stored in the main memory? (What would be pros/cons of that)?

4. Let us implement the EMPLOYEE relation of the Company database as a heap file (E&N, Ch. 5.7). The size of a disk block is 4 KB. What kind of an internal structure would you use for an EMPLOYEE tuple? (What alternative structures there exist? Give three different structures.) Estimate the size of the file (the number of blocks), when there are 10000 employees in all.

5. (**) a) A file of 120000 blocks is stored optimally on a disk pack having 12 disk surfaces, 4000 tracks per surface and 24 blocks per track. How much time is spent in scanning through the file when the blocks are read in the order they are stored on the disk. The seek time is 1-12 ms (the average is 6 ms) and block transfer time 0.2 ms. (Gaps between sectors need not to be considered.)

b) What is the time needed to read all blocks separately (in an arbitrary order)?

c) With what number of blocks X it is more efficient to scan through the whole file as compared to the series of single requests?

d) Which of the indicated disk parameters have the most effect to these results? (Estimate shortly how the changes in parameter values are visible in results.)