The minimum requirement for active attendance: 2 tasks done
The first exam will be as indicated: Nov 3rd at 15:00-18:00 in Porthania I. The requirements of the exam: pages 1-181 in lecture notes and exercises 1-5. (The corresponding parts in E&N are given on the English course page.) On week 44 (Nov 1-5) we have neither lectures nor exercise groups.
1. Consider the following query select e.LNAME, s.LNAME from EMPLOYEE e, EMPLOYEE s, DEPARTMENT d where e.SUPERSSN = s.SSN and e.DNO = d.DNUMBER and d.DNAME = "Research" and e.SALARY > 30000. a) Give the projection-selection-product relational expression directly corresponding to the query. Also give the corresponding query tree. Associate the cartesian products in such a way that presumably will result in the best optimization effect. b) Optimize the query using heuristics "apply the selections as early as possible". Give the query tree and the corresponding relational expression for the optimized query. 2. How would you evaluate the query of the previous problem when there is an index on attribute SSN to relation EMPLOYEE and an index on attribute DNAME to relation DEPARTMENT? 3. In the nested (inner-outer) loop approach of computing the join of relations r and s one may choose r as the outer relation and s as the inner relation, or vice versa. Is it better to choose the larger or the smaller relation as the outer relation? 4. Consider computing the natural join of relations r(A,B) and s(B,C) using the sort-merge join algorithm, when the join attribute $B$ is a unique key of s. Sketch the algorithm of the merge phase of the join. The algorithm uses only three pages of buffer memory. 5. Consider the evaluation of the relational expression P (r * s * t) (P is the projection operator) BD when the relations r(AB), s(BC) and t(CD) are as follows. r = {(i,1) | i = 1,...,n}, s = {(1,j) | j = 1,...,n}, t = {(k,2) | k = 1,...,n}. What is the size of the result, that is, how many tuples does the value of the expression contain? How large intermediary results are generated, when the expression is evaluated as a) P (r * P (s * t)), b) P (P (r * s) * t) BD BD BD BC Assume that catalog information about the sizes of relations and the sizes of attribute domains are available for the query optimizer. What kind of heuristics could be used to choose the more efficient of the evaluation strategies?