Building queries with Relational Algebra

The fundamental operations of the relational algebra are simple operations involving one or two relations as their operands. A result of an operation may be further used as an operand in another operation. The combined effects of a sequence of operations determine the final result

When building a query the task is to find the operation sequence that will produce a correct result. Since the operations of the relational algebra are quite simple, many intermediate results might have to be produced before the final result is reached. The intermediate results are used as operands in the operations that produce new intermediate results. The structure of the query with its operations and intermediate results may be presented as a diagram, as in the figure above. The rectangles are relations (original or operation results), and the ovals are operations. The operation name is inside each oval, and a condition or other specification may be written inside or beside it. We use these diagrams to clarify the stepwise construction of a query.

For most relational algebra operations, the order of execution does not matter, which means that the same result can be reached by forming and combining intermediate results in different ways. In practice, database queries are pretty far made with the help of operations that resemble the relational algebra operations. The order of executing operations and producing intermediate results is determined by a query optimizer. The order of execution and the intermediate results are important for the efficiency of the query. The query optimizers try to arrange the operations so that the whole execution will be as efficient as possible. In practice, this usually means that they try to minimize the intermediate results as quickly as possible.

When forming a query, we should
• first choose which relations are needed to achieve the answer, and
• then specify the operations and intermediate results that are needed

Let us consider a sample database

Car(RegiNo, Make, ModelYear, Color)
Inspection(RegNo->Car, DateInspected, Period, Evaluation),
Problem((RegNo, DateInspected)->Inspection, ProblemCode)
Driver(RegNo->Car, Name, Accidents)

Example 1:

Information need: Information about cars of year 1996 model, where faults have been found in the inspection for year 1999.

First we deduce ‘information about cars’ to mean the values of all attributes of the relation Car. Information about inspections is stored in table Inspection and if faults are found they are registered in table Problem. Thus we need these three tables to obtain the information we want.

Only cars of year 1996 model are interesting. The model year of a car is represented as the value of attribute ModelYear in table Car. Our first intermediate result consists of tuples representing cars of model year 1996. This is obtained using selection

C1996 =

_{ModelYear=1996} (Car)

The cars we are interested in should have been inspected for the period 1999. Thus we need only the rows that cover that period. We use selection to retrieve them.

In1999 =

_Period=1999 (Inspection)

Now we have the cars and inspections we were interested in. We have to connect the rows. We use the join operation. They should be joined by the common register number. Because register number is the only common column we may use natural join.

CI = C1966 * In1999 =

_{ModelYear=1996} (Car) *

_Period=1999 (Inspection)

To find out whether faults were found in the inspections we need to connect the problem rows with the inspection. We have already connected the inspection rows to the cars and may now connect that result with the Problem table. Join should be based on the common registration number and date inspected. These are the only common columns in the tables so we may use natural join.

CIP = CI * Problem

All the connections have been made. We have cars connected to inspections and problems found in those inspections. Cars that have not been inspected for year 1999 or cars that have been inspected but had no problems are not included in the intermediate result CIP. We wanted only the car data as the result. Thus we have to use project to extract them.

FaultyCar =

RegNo,Make,ModelYear,Color (CIP)

The expression without intermediate results is:

FaultyCar =

RegNo,Make,ModelYear,Color (

(_{ModelYear=1996} (Car) * _Period=1999 (Inspection)) * Problem)

Example 2:

Information need: Driver’s name for the model year 1995 or older cars that have not been inspected for year 2000.

Driver's name is in table Driver. Inspection are described in table Inspection and cars in table Car. Thus we need those three tables.

First, we should find out cars that have not been inspected for year 2000. We cannot solve this by using the table Inspection alone, because it contains data about those inspections that have been done, not about nonexisting inspections. Especially selecting rows where period is not equal to 2000 results to inspections of all other years but year 2000. This problem may be solved by finding out the complement 'cars that are inspected for year 2000'. Actually we need only their registration numbers.

InR2000=

RegNo (

Period=2000 (Inspection))

The cars of model year 1995 and older are retrieved using select on table Car. Actually we need only their regitration numbers.

C1995R =

RegNo (

ModelYear<=1995 (Car))

We may use difference operation to get the regitration numbers of such cars that are not inspected.

CIR = CR1995 - InR2000

To find the divers we need to join the registration numbers obtained and the Driver table and then project out the names. We may use natural join.

Names =

Name (CIR * DRIVER)

The expression without intermediate results is:

Names =

Name (Driver * ( RegNo (_{ModelYear<=1995} Car) ) -

RegNo ( _{ModelYear<=1995} (Car)))

Example 3

Information need: What makes have provided similarly colored cars as models 1999 and 2000?

The color and make of car is stored in table Car. To find out that two cars have the same color we must construct pairs of cars sharing the color. The restriction to model years is done with selection as in the two previous examples. The instances of Car relation are renames as C1 and C2.

Makes= Make (C2:= ModelYear=2000 (Car)

C2.Color=C1.Color

C1:= ModelYear=2000 (Car) )