Contents |
|
|||||
Reading material | Elmasri&Navathe: Chapters 8.2,8.3.4 (pages 251-261, 267-268) |
When several tables are given in the query's FROM part, the output is the (cross) product of these tables. The join, i.e. rows combined according to given conditions, is achieved by giving the join conditions in the query's WHERE part.
In the following example, we want to know into which areas the delivery boys have made their deliveries. To get this knowledge we have to join the tables DeliveryBoy, Delivery, Ordered and Customer. From table DeliveryBoy we get the boy's name and identifier. From table Ordered we get the time of the delivery. From table Customer we get the area. The table Delivery connects delivery boys to the deliveries.
select DeliveryBoy.Name, DeliveryBoy.BoyID, Ordered.WhenMade, Customer.AreaCode from Customer, Ordered, Delivery, DeliveryBoy where Customer.CustomerId=Ordered.Customer and Ordered.OrderId=Delivery.OrderID and Delivery.DeliveredBY=DeliveryBoy.BoyID and Delivery.WhenDelivered is not null order by DeliveryBoy.Name, Customer.Areacode; |
In this query no columns of the table Delivery are included in the result. This table is included in the query only because without it we would not be able to connect the delivery boys to the orders they have delivered. This query is depicted in the figure below. In this figure the columns that providing values for the result table are marked as red. THe joins in this query are based on the foreign keys, and the join conditions are
Customer.CustomerId=Ordered.Customer
Ordered.OrderId=Delivery.OrderID
Delivery.DeliveredBY=DeliveryBoy.BoyID
All tables, from which data is to be taken into the result table, must be listed in the FROM part of the query. This result data is marked as red in the above figure. The FROM part of the query may also contain tables that do not produce any column values in the result. These tables must however be connected to other tables.
The order of tables in the FROM part of the query does not matter as the output of the query is concerned. Similarly, the conditions in the WHERE part can be given in whichever order. The order of tables in the FROM part and the order of conditions in the where part may, however, affect on the efficiency of queries in some systems. So, especially when large tables are in question, it is advisable to find out whether the order of the tables or the conditions is important and how do they effect.
*****
Join conditions restrict some rows of the cross product of being included in the result. Thus it is essential that the conditions are correct and complete. A typical error in SQL queries is to omit parts of the join conditions. This causes extra rows to be included in the result
The following query tries to find out the prices for items included in the order 3020.
select product.productId, product.ModelId, price from Ordered, ItemOrdered, Product where ordered.orderid=3020 and ordered.orderId=ItemOrdered.orderId and ItemOrdered.productId=product.productID and ItemOrdered.modelID= product.modelID |
Tables ItemOrdered and Product are in this query connected by comparing the foreign key in ItemOrdered with the primary key of Product. In this case the foreign key has two columns ProductId and ModelId. Thus we need two elementry conditions
ItemOrdered.productId=product.productID and
ItemOrdered.modelID= product.modelID
in the where part of the query. Suppose that two items P02, BIG and T01,DOUBLE has been ordered in order 3020. Then the result will contain only two rows. But if we omit the condition
ItemOrdered.productId=product.productID
items and products are no more connected by the full foreign key and the item P02,BIG would be connected to all products having the modelID BIG which could be tens of products. Similarly T01 would be connected to all product with modelID DOUBLE: If we omit the condition
ItemOrdered.modelID= product.modelID
then P02,BIG would be connected to all portions of P01 and T01,DOUBLE to all portions of T01. The result has probably less rows that the previous one but more that the correct query.
Use trainer to see the effects.
Connecting rows by comparing the key of one table with the foreign key that refers to it is the most typical way of connecting rows. To be successful the comparison needs as many elementary 'and'-connected conditions as there are columns in the key.
In this example the tables had common column names. Thus we had to use qualifiers in front of the column names.
A table can and sometimes must be renamed within a query. The new temporary name is called a correlation name. It is defined in the FROM-part of the query following the tablename and the optional keyword AS. If a correlation name is assigned for the table, all references to that table must be made using the correlation name. Correlation names may be used to substitute long tablenames with shorter ones. See example below
select tyo.lukuvuosi,tyo.lukukausi,tyo.kurssinumero, sukunimi, etunimet from opettaja, opintojakso as oj, opetustehtava as ot, tyo where opettaja.opetunnus=tyo.opettaja and tyo.kurssikoodi= ot.kurssikoodi and tyo.lukukausi=ot.lukukausi and tyo.lukuvuosi=ot.lukuvuosi and tyo.kurssinumero= ot.kurssinumero and tyo.opetusnumero= ot.opetusnumero and ot.kurssikoodi=oj.kurssikoodi and ot.tyyppi='luento' and oj.nimi_suomeksi= 'Tietokantojen perusteet' order by tyo.lukuvuosi,tyo.lukukausi,tyo.kurssinumero |
Correlation names must be used, if the same table appears more than once in the query, i.e. we want to connect one row in a table with another row on that same table. Next example produces an English-Italian dictionary of product and material names. The first instance of table ObjectName is temporarily renamed as 'engl' and the second one as 'it'. This example also shows how tho rename the columns of the result.
select engl.objectname english_name, it.objectname italian_name, from object_name engl, objectname it where engl.objectID = it.objectID order by english_name |
If we want data from several rows of some table to appear on a single result row, we must include this table in the from part of the query as many times. Thus if we want data from three rows of table customer to appear in a single result row, we must include table customer in the from part as three instances. The next example fetches the names of customers that have ordered both beer and CocaCola in the same order. Tables objectnamea and orderitem have tho instances in this query.
select distinct customer.name from customer, ordered, orderitem beer_order, orderitem coke_order, objectname beer, objectname coke where customer.customerID=ordered.customer and ordered.orderID=beer_order.orderId and and ordered.orderID= coke_order.orderID and beer_order.productID=beer.objectID and coke_order.productID=coke.objectID and beer.objectName='beer' and beer.language='english' and coke.objectName='coca cola' and coke.language='english' |
Esimerkin kysely sisältää varsin monimutkaiset liitosehdot. Alla oleva kuva.
SQL-92 standard introduced an alternate way to define a join. This way of presentation also enables the presentation of outer joins. However all the dbms producers, for example Oracle, have not implemented this technique. This techniques makes is possible to use joined tables in the FROM-part of the query. A joined table is presented as follows
format | table1 join_type JOIN table2 ON join_conditions |
example | deliveryBoy INNER JOIN delivery ON deliveryBoy.BoyID=delivery.deliveredBy |
THe example above specifies a normal join of deliveryBoy and delivery on foreign key deliveredBy.
SQL-92 defines four join types
below is an exaple of an outer join. We want to get the names of delivery boys and the dates they have made deliveries.. We want to include also delivery boys that have not made deliveries at all.
select distinct name, whendelivered from deliveryBoy left outer join ordered on boyID=delivered order by name, whendelivered |
Note: This does not work in Oracle. In Oracle the outer join is presented as below.
select distinct name, whendelivered from deliveryBoy, ordered where boyID(+)=delivered order by name, whendelivered |
Selections and especially joins are the main factors affecting the efficiency of queries. In a join operation, pairs that fulfil the the conditions must be found for each row. Several strategies can be used for creating a join, and the choice of strategy depends on what kind of technical solutions are used in implementing the tables and on how large the tables are.
A search for rows according to some column value can be enhanced by specifying an index for that column. This can be compared to an index of key words in a book. An index based on key columns makes searches based on the key much quicker. If the table has 10,000 rows, for example, a search for one specific row would demand about 5000 rows to be read (half of the table) on an average without an index. Often, several table rows fit into one segment of disk storage. Let us assume that in this case, 10 rows fit into one disk segment. Thus without an index, we need to locate and read 500 disk segment to find a row. With the help of an index, the amount of segments can be cut down to 3 or 4 for such a search.. Searches based on keys are typical when a join is made by comparing a foreign key and the corresponding primary key.. Searches based on a key also have to be made at each insertion, to ensure the uniqueness of the key. Some systems, like Oracle, automatically make an index on the key columns of a table without needing any separate specifications. Indexes can be based on one or several columns. The syntax for index specifications is not included in the SQL standard. However, most of the systems specify an index as follows:
format | create [unique] index index_name on table_name( column_name 1 [, ...]) |
example | create unique index pr_index on product(productID,modelID) |
The specification unique in the index specification means that the same value cannot occur on more than one row in the indexed columns.
Indexes are not within the scope of this course as such, but how to specify them is good to know when working with large tables. The use of an index does not show in any way in the query. Using it is up to the query optimiser, but the specification and creation of it is up to the database designer.