Database management, Autumn 2000, Exercise 4 (9.-12.10.)

The minimum requirement for active attendance: 3 tasks done

1. Consider the following query

select e.LNAME, s.LNAME
from EMPLOYEE e, EMPLOYEE s, DEPARTMENT d
where e.SUPERSSN = s.SSN and e.DNO = d.DNUMBER and d.DNAME = "Research"
and e.SALARY > 30000.

a) Give the projection-selection-product relational expression directly corresponding to the query. Also give the corresponding query tree. Associate the cartesian products in such a way that presumably will result in the best optimization effect.

b) Optimize the query using heuristics "apply the selections as early as possible". Give the query tree and the corresponding relational expression for the optimized query.

c) Is it possible to optimize the query still further?

2. How would you evaluate the query of the previous problem when there is an index on attribute SSN to relation EMPLOYEE and an index on attribute DNAME to relation DEPARTMENT? Are there differences between indexes of different kind?

3. Suppose the relational database schema (with keys indicated as bolded):

Person(SSN, FirstName, LastName),
Dead(SSN, DeathDate),
MarriedCouple(HusbSSN, WeddingDate, WifeSSN),
Divorced(HusbSSN, WeddingDate, DivorceDate).

a) Write the following query in relational algebra:

The names of Melanie Griffith's husbands and the associated wedding dates.

b) What kind of file and index structures would you use for the database relations? We require that the query (a) be executed efficiently. Explain the principle of executing the query (with indexes).

4. The query

select lname, fname, dep_name, relationship
from employee, dependent
where essn=ssn and lname='Smith';

gives the dependents of all Smith's.

Estimate the number of disk accesses needed when different file and access structures as well as different execution plans are used. Assume that there are 20000 tuples of 100 bytes in relation employee, and 50000 tuples of 50 bytes in relation dependent. The page (block) size is 4 KB. (You need not to evaluate all variations; it is sufficient to find some very efficient and some very inefficient alternatives.)

5. In a nested-loop join it is advantageous to use the file with fewer blocks as the outer-loop file. Likewise, in a single-loop join the file with the high join selection factor should be used in the (outer) join loop. (See E&N, p. 598).

Are these rules always valid? Could you find counterexamples?

6. Consider the evaluation of the relational expression

        P  (r * s * t)     (P = projection, * = natural join)
         BD
when the relations r(AB), s(BC) and t(CD) are as follows.

r = {(i,1) | i = 1,...,n},
s = {(1,j) | j = 1,...,n},
t = {(k,2) | k = 1,...,n}.

What is the size of the result, that is, how many tuples does the value of the expression contain? How large intermediary results are generated, when the expression is evaluated as

a)      P  (r * P  (s * t)),    b)   P  (P  (r * s) * t)
         BD      BD                   BD  BC
Assume that catalog information about the sizes of relations and the sizes of attribute domains are available for the query optimizer. What kind of heuristics could be used to choose the more efficient of the evaluation strategies?

The Finnish lecture notes are available via the Finnish course page ('Luentokalvot') in pdf-format or in a course folder in room A412 (on paper).