Join Queries

Contents
Multiple Tables in the Query's FROM part
Complete join conditions
Tables temporarily renamed
An alternate join notation and outer join
Efficiency of queries
Reading material

Elmasri&Navathe: Chapters 8.2,8.3.4 (pages 251-261, 267-268)

Multiple Tables in the Query's FROM part

When several tables are given in the query's FROM part, the output is the (cross) product of these tables. The join, i.e. rows combined according to given conditions, is achieved by giving the join conditions in the query's WHERE part.

In the following example, we want to know into which areas the delivery boys have made their deliveries.  To get this knowledge we have to join the tables DeliveryBoy, Delivery, Ordered and Customer. From table DeliveryBoy  we get the boy's name and identifier. From table Ordered we get the time of the delivery. From table Customer we get the area. The table Delivery connects delivery boys to the deliveries.

select DeliveryBoy.Name, DeliveryBoy.BoyID, Ordered.WhenMade,
  Customer.AreaCode
from Customer,  Ordered, Delivery, DeliveryBoy
where Customer.CustomerId=Ordered.Customer and
  Ordered.OrderId=Delivery.OrderID and
  Delivery.DeliveredBY=DeliveryBoy.BoyID and
  Delivery.WhenDelivered is not null
order by DeliveryBoy.Name, Customer.Areacode;

In this query no columns of the table Delivery are included in the result. This table is included in the query only because without it we would not be able to connect the delivery boys to the orders they have delivered.  This query is depicted in the figure below. In this figure the columns that providing values for the result table are marked as red. THe joins in this query are based on the foreign keys, and the join conditions are 

    Customer.CustomerId=Ordered.Customer 
    Ordered.OrderId=Delivery.OrderID 
    Delivery.DeliveredBY=DeliveryBoy.BoyID

 

 

All tables, from which data is to be taken into the result table, must be listed in the FROM part of the query. This result data is marked as red in the above figure. The FROM part of the query may also contain tables that do not produce any column values in the result. These tables must however be connected to other tables.

The order of tables in the FROM part of the query does not matter as the output of the query is concerned. Similarly, the conditions in the WHERE part can be given in whichever order. The order of tables in the FROM part and the order of conditions in the where part may, however, affect on the efficiency of queries in some systems. So, especially when large tables are in question, it is advisable to find out whether the order of the tables or the conditions is important and how do they effect. 

*****

Complete join conditions

Join conditions restrict some rows of the  cross product of  being included in the result. Thus it is essential that the conditions are correct and complete. A typical error in SQL queries is to omit parts of the join conditions. This causes extra rows to be included in the result 

The following query tries to find out the prices for  items included in the order 3020.

select product.productId, product.ModelId, price
from Ordered, ItemOrdered, Product
where 
  ordered.orderid=3020 and
  ordered.orderId=ItemOrdered.orderId and
  ItemOrdered.productId=product.productID and
  ItemOrdered.modelID= product.modelID

Tables ItemOrdered  and Product are in this query connected by comparing the foreign key in ItemOrdered with the primary key of Product. In this case the foreign key has two columns ProductId and ModelId. Thus we need two elementry conditions 

  ItemOrdered.productId=product.productID and
  ItemOrdered.modelID= product.modelID

in the where part of the query. Suppose that two items P02, BIG and T01,DOUBLE has been ordered in order 3020. Then the result will contain only two rows. But if we omit the condition 

  ItemOrdered.productId=product.productID  

items and products are no more connected by the full foreign key and the item P02,BIG would be connected to all products having the modelID BIG which could be tens of products. Similarly T01 would be connected to all product with modelID DOUBLE: If we omit the condition

ItemOrdered.modelID= product.modelID

then P02,BIG would be connected to all portions of P01 and T01,DOUBLE to all portions of T01. The result has probably less rows that the previous one but more that the correct query.

Use trainer to see the effects.

Connecting rows by comparing the key of one table with the foreign key that refers to it is the most typical way of connecting rows. To be successful the comparison needs as many elementary 'and'-connected conditions as there are columns in the key.

In this example the tables had common column names. Thus we had to use qualifiers in front of the column names.

Tables temporarily renamed

A table can and sometimes must be renamed within a query. The new temporary name is called a correlation name. It is defined in the FROM-part of the query following the tablename and the optional keyword AS. If a correlation name is assigned for the table, all references to that table must be made using the correlation name. Correlation names may be used to substitute long tablenames with shorter ones. See example below

select tyo.lukuvuosi,tyo.lukukausi,tyo.kurssinumero,
   sukunimi, etunimet 
from opettaja, opintojakso as oj, 
     opetustehtava as ot, tyo
where 
  opettaja.opetunnus=tyo.opettaja and
  tyo.kurssikoodi= ot.kurssikoodi and
  tyo.lukukausi=ot.lukukausi and
  tyo.lukuvuosi=ot.lukuvuosi and
  tyo.kurssinumero= ot.kurssinumero and
  tyo.opetusnumero= ot.opetusnumero and
  ot.kurssikoodi=oj.kurssikoodi and
  ot.tyyppi='luento' and
  oj.nimi_suomeksi=
    'Tietokantojen perusteet' 
order by tyo.lukuvuosi,tyo.lukukausi,tyo.kurssinumero

Correlation names must be used, if the same table appears more than once in the query, i.e. we want to connect one row in a table with another row on that same table. Next example produces an English-Italian dictionary of product and material names. The first instance of table ObjectName is temporarily renamed as 'engl' and the second one as 'it'. This example also shows how tho rename the columns of the result.

select engl.objectname english_name,  
       it.objectname italian_name,
from object_name engl, objectname it
where engl.objectID = it.objectID
order by english_name

If we want data from several rows of some table to appear on a single result row, we must include this table in the from part of the query as many times. Thus if we want data from three rows of  table customer to appear in a single result row, we must include table customer in the from part as three instances. The next example fetches the names of customers that have ordered both beer and CocaCola in the same order. Tables objectnamea and orderitem have tho instances in this query.

select distinct customer.name 
from customer, ordered, orderitem beer_order, 
orderitem coke_order, 
     objectname beer, objectname coke
where 
  customer.customerID=ordered.customer and
  ordered.orderID=beer_order.orderId and and
  ordered.orderID= coke_order.orderID and
  beer_order.productID=beer.objectID and
  coke_order.productID=coke.objectID and
  beer.objectName='beer' and beer.language='english' and
  coke.objectName='coca cola' and coke.language='english'

Esimerkin kysely sisältää varsin monimutkaiset liitosehdot. Alla oleva kuva.

 

An alternative join annotation and outer join

SQL-92 standard introduced an alternate way to define a join. This way of presentation also enables the presentation of outer joins. However all the dbms producers, for example Oracle, have not implemented this technique. This techniques makes is possible to use joined tables in the FROM-part of the query. A joined table is presented as follows

 

format table1 join_type JOIN table2 ON join_conditions
example
deliveryBoy INNER JOIN delivery 
ON deliveryBoy.BoyID=delivery.deliveredBy

THe example above specifies a normal join of deliveryBoy and delivery on foreign key deliveredBy.

SQL-92 defines four join types

below is an exaple of an outer join. We want to get the names of delivery boys and the dates  they have made deliveries.. We want to include also delivery boys that have not made  deliveries at all.

select distinct name, whendelivered
from deliveryBoy left outer join ordered 
     on boyID=delivered
order by name, whendelivered

Note: This does not work in Oracle. In Oracle the outer join is presented as below.

select distinct name, whendelivered
from deliveryBoy, ordered 
where boyID(+)=delivered
order by name, whendelivered

On query efficiency

Selections and especially joins are the main factors affecting the efficiency of queries. In a join operation, pairs that fulfil the the conditions must be found for each row. Several strategies can be used for creating a join, and the choice of strategy depends on what kind of technical solutions are used in implementing the tables and on how large the tables are.

A search for rows according to some column value can be enhanced by specifying an index for that column. This can be compared to an index of key words in a book. An index based on key columns makes searches based on the key much quicker. If the table has 10,000 rows, for example, a search for one specific row would demand about 5000 rows to be read  (half of the table) on an average without an index. Often, several table rows fit into one segment of disk storage. Let us assume that in this case, 10 rows fit into one disk segment. Thus without an index, we need to locate and read 500 disk segment to find a row.  With the help of an index, the amount of segments can be cut down to 3 or 4 for such a search.. Searches based on keys are typical when a join is made by comparing a foreign key and the corresponding primary key.. Searches based on a key also have to be made at each insertion, to ensure the uniqueness of the key. Some systems, like Oracle, automatically make an index on the key columns of a table without needing any separate specifications. Indexes can be based on one or several columns. The syntax for index specifications is not included in the SQL standard. However, most of the systems specify an index as follows:

 

format create [unique] index index_name on table_name( column_name 1 [, ...])
example create unique index pr_index on product(productID,modelID)

The specification unique in the index specification means that the same value cannot occur on more than one row in the indexed columns.

Indexes are not within the scope of this course as such, but how to specify them is good to know when working with large tables. The use of an index does not show in any way in the query. Using it is up to the query optimiser, but the specification and creation of it is up to the database designer.