Database management, Autumn 2001, Exercise 4 (9.-12.10.)

The minimum requirement for active attendance: 3 tasks done

The tasks marked with '(**)' are counted as two tasks.
The background of these tasks is EN, Ch. 18 (query processing and optimization)

1. Compare the query trees given in Fig. 18.4 (EN) with the following relation sizes: employee: 10 000, department: 200, and project: 1000 rows. What are the sizes of the intermediate results in these trees? How the attribute values might be distributed in typical (or extreme) situations, and what kind of effects these distributions might have on the selection and join operations?

2. Consider the following query

select n.DEPENDENT_NAME, e.LNAME
from EMPLOYEE e, DEPARTMENT d, DEPENDENT n
where e.SSN = n.ESSN and e.DNO = d.DNUMBER and d.DNAME = 'Research'
and n.RELATIONSHIP='SPOUSE'.
a) What is the meaning of the query?
b) Give the projection-selection-product relational expression directly corresponding to the query. Also give the corresponding query tree. Associate the cartesian products in such a way that presumably will result in the best optimization effect.
c) Optimize the query using heuristics "apply the selections as early as possible". Give the query tree and the corresponding relational expression for the optimized query.
d) Is it possible to optimize the query still further?

3. How would you evaluate the query of the previous problem when there is an index on attribute SSN to relation EMPLOYEE and an index on attribute DNAME to relation DEPARTMENT? Are there differences between indexes of different kind? Is it possible to make the query more efficient using some other index?

4. Suppose the relational database schema (with keys indicated as bolded):

Person(SSN, FirstName, LastName),
Dead(SSN, DeathDate),
MarriedCouple(HusbSSN, WeddingDate, WifeSSN),
Divorced(HusbSSN, WeddingDate, DivorceDate).

a) Write the following query in relational algebra: "The names of Melanie Griffith's husbands and the associated wedding dates."
b) What kind of file and index structures would you use for the database relations? We require that the query (a) be executed efficiently. Explain the principle of executing the query (with indexes).

5. (**) Let us assume the query

select FNAME, LNAME, PNO, HOURS
from EMPLOYEE, WORKS_ON
where ESSN=SSN and HOURS > 20;
Evaluate the number of disk accesses (minimum and maximum values) needed when different file/index structures and different execution plans are used. The relation EMPLOYEE has 10000 rows of 100 characters, and the relation WORKS_ON 30000 rows of 20 charcaters. The page size is 2KB. (Try to find some good and some bad execution plans - not necessarily 'all' of them!)

6. a) Explain two essentially different ways to implement the query

select * from EMPLOYEE where SALARY < 20000 or DNO = 3;

The essential point is that the two conditions are disjunctively combined. Consider the use of indexes.

b) On page 39 (ch. 4) in the lecture notes there exists the query

SELECT *
  FROM accounts
  WHERE custno IN
    (SELECT custno FROM customers);

which is transformed to the query

SELECT accounts.* FROM accounts, customers WHERE accounts.custno = customers.custno;

The former contains a subquery and the latter a join operation. The queries could be interpreted as two alternative execution plans for the same query.
Is this transformation always possible? How about the efficiency of these versions?

The Finnish lecture notes are available via the Finnish course page ('Luentokalvot') in pdf-format or in a course folder in room A412 (on paper).