The minimum requirement for active attendance: 3 tasks done
The tasks marked with '(**)' are counted as two tasks.
The background of these tasks is EN, Ch. 18 (query processing and
optimization)
1. Compare the query trees given in Fig. 18.4 (EN) with the following relation sizes: employee: 10 000, department: 200, and project: 1000 rows. What are the sizes of the intermediate results in these trees? How the attribute values might be distributed in typical (or extreme) situations, and what kind of effects these distributions might have on the selection and join operations?
2. Consider the following query
select n.DEPENDENT_NAME, e.LNAME from EMPLOYEE e, DEPARTMENT d, DEPENDENT n where e.SSN = n.ESSN and e.DNO = d.DNUMBER and d.DNAME = 'Research' and n.RELATIONSHIP='SPOUSE'.a) What is the meaning of the query?
3. How would you evaluate the query of the previous problem when there is an index on attribute SSN to relation EMPLOYEE and an index on attribute DNAME to relation DEPARTMENT? Are there differences between indexes of different kind? Is it possible to make the query more efficient using some other index?
4. Suppose the relational database schema (with keys indicated as bolded):
Person(SSN, FirstName, LastName),
Dead(SSN, DeathDate),
MarriedCouple(HusbSSN, WeddingDate, WifeSSN),
Divorced(HusbSSN, WeddingDate, DivorceDate).
a) Write the following query in relational algebra: "The names of Melanie Griffith's husbands and the associated
wedding dates."
b) What kind of file and index structures would you use for
the database relations? We require that the query (a)
be executed efficiently. Explain the principle of
executing the query (with indexes).
5. (**) Let us assume the query
select FNAME, LNAME, PNO, HOURS from EMPLOYEE, WORKS_ON where ESSN=SSN and HOURS > 20;Evaluate the number of disk accesses (minimum and maximum values) needed when different file/index structures and different execution plans are used. The relation EMPLOYEE has 10000 rows of 100 characters, and the relation WORKS_ON 30000 rows of 20 charcaters. The page size is 2KB. (Try to find some good and some bad execution plans - not necessarily 'all' of them!)
6. a) Explain two essentially different ways to implement the query
select * from EMPLOYEE where SALARY < 20000 or DNO = 3;
The essential point is that the two conditions are disjunctively combined. Consider the use of indexes.
b) On page 39 (ch. 4) in the lecture notes there exists the query
SELECT * FROM accounts WHERE custno IN (SELECT custno FROM customers);The former contains a subquery and the latter a join operation. The queries could be interpreted as two alternative execution plans for the same query.which is transformed to the query
SELECT accounts.* FROM accounts, customers WHERE accounts.custno = customers.custno;
The Finnish lecture notes are available via the Finnish course page ('Luentokalvot') in pdf-format or in a course folder in room A412 (on paper).