Contents | |
Reading material |
Elmasri&Navathe: Chapter 8.2 (pages 251-261) |
In simple queries, one output row is formed for each row (or combined row) that fulfils the conditions given in the WHERE-part of the query. The output row contains the values of those columns that are listed in the result row specification (see figure below).
If the query uses only one table, the selection of rows according to the WHERE-conditions equals to the selection operation in the relational algebra. If the output rows are specified by listing some columns of the target table, the output of the SQL-query resembles the result of the projection operation in the relational algebra. The difference is that duplicate rows are not automatically discarded form the answer of an SQL query, as it is in a projection. Thus, a similar row can appear many times in the result table.
The following query
select AreaCode from Customer; |
gives each areacode as many times as there are customers registered to locate in that area. Supposed there are 5 customers in area A01 and 7 customers in area A02, we would get the value A01 five times and the value A02 seven times.
Although SQL does not, by default, eliminate duplicate rows, it is possible to force it to do so, and thus to obtain a behaviour similar to the projection of the relational algebra. To do this we should include the keyword DISTINCT as the first element of the output row specification.
The above query when modified to
select distinct AreaCode from customer; |
gives each areacode only once.
To consider: Why does SQL not discard duplicate rows automatically? See Explanation |
The same basic elements can be used in specifying output rows and in the conditions of the WHERE-part of a query. These include:
All these elements may also be used as operands in the conditions of the where-part of the query.
The format of the column name does not matter. A qualifier can be attached in front of a column name, to indicate which table the column belongs to. A full column reference has the following format:
syntax | [[[database_name.]schema_name.]table_name.]column_name |
example | test_db.pizza.customer.customerID
product.productID |
How to express constant values (literals) depends on the type of the constant. The matter was discussed in the section on basic data types.
Strings are enclosed in single quotation marks, for example 'THIS IS A STRING'
Numbers are represented without quotation marks, for example 2002.
Date and time representations vary according to the database management system, but ISO Standard date format YYYY-MM-DD is most common. In the following examples we, however, use Oracle default date format DD-MON-YYYY (for example 24-FEB-2001) because our example database runs on Oracle. (unless we change the dbms or we find out how to force Oracle to use ISO-date). Dates are enclosed in single quotation marks, for example '24-FEB-2001'.
Null values are referred as NULL (need not be capital).
The arithmetic expressions of SQL are correspond to the ones in programming languages in general. The operations in use are: + (addition), - (subtraction), * (multiplication), and / (division). Parenthesis may be used to indicate the order of computation.
example | 2.5*(lenght-100)/weight |
Dates and times can be added and subtracted. Subtraction of one date from another gives the number of days between these dates as an interval. Subtracting a timestamp from another gives the interval between the timestamps. Intervals may be added to dates, times and timestamps. In the following example a 10 day interval is added to the date '12-APR-2001' and a 12 interval (0.5 days) is added to the timestamp '12-APR-2001 8.45'. These are expressed in Oracle SQL syntax.
examples | '12-APR-2001' + 10 = '22-APR-2000'
'12-APR-2001 8.45' + 0.5 = '12-APR-2001 20.45' |
If some of the columns used in an arithmetic expression has the value NULL, the entire expression will have the value NULL.
Character strings may be concatenated. The symbol for this operation is ||. In the following example, the value of the column FirstNames is concatenated after the value of the column Surname, comma and a space character.
example | Surname || ', ' || FirstNames |
The treatment of null values in concatenation varies. They may be treated as they are treated in arithmetic expressions, i.e. the result is null if at least one of the operands is null. Oracle treats null values in concatenation differently. When a null valueis concatenated with some real value the result is the real value, only the concatenation of two null values results in a null value.
Standard SQL allows the concatenation also between two binary elements.
SQL offers a variety of scalar functions to process values. The number of functions offered by the SQL standard has been and is still quite restricted, though it is increasing. In the standard of 1986, for example, there were no scalar functions. This is why the producers of database management systems have implemented functions that are somewhat different from each other. Though new functions have been added to the standard, since, the producers (e.g. Oracle) have not changed their products to accord with the standard. This is why the use of scalar functions nearly always brings with it portability problems.
In the table below, some of the main functions are presented in the standard as well as Oracle's format. Column names, literals and expressions can be used as parameters in these functions. There are many more functions in Oracle than in the standard.
Standard | Oracle | Meaning |
abs(x) | abs(x) | absolute value of x |
mod(x,y) | mod(x,y) | modulus of the division x/y |
length(m) | length(m) | length of the string m |
position( p in m) | instr(m,p) | the position of string p in string m. Note that in SQL the first character has the position index 1. |
lower(m) | lower(m) | string m as lowercase |
upper(m) | upper(m) | string m as uppercase |
substring( m from b for s) | substr(m,b,s) | a substring of m starting at position b and having the length s |
cast(x as type) | to_char(x, format)
to_number(x,format) to_date(x,format) |
x as another data type |
case x
when y1 then z1 ... [else e] end |
decode(x,y1,z1,... e) | value of x selects the resulting value |
nullif(x,y) | decode(x,y,NULL,x) | if x=y then NULL otherwise x |
coalesce(x,y) | nvl(x,y) | if x is null then y otherwise x |
extract(fn from t) | to_char(t, format) | the element fn of timevalue t |
current_date
current_time current_timestamp |
sysdate
|
now |
In the following example, customer's first name and the dates of her orders are requested using both the standard and Oracle SQL. Customer's first name is capitalized and the date is expressed according to ISO-8601 standard
Standard |
select upper(substring(name from 1 for position(' ',name)-1)), cast(extract(YEAR,whenMade),VARCHAR)||'-'|| cast(extract(MONTH,whenMade),VARCHAR)||'-'|| cast(extract(DAY,whenMade),VARCHAR)|| from customer, ordered where ordered.customer=customer.customerID order by whenmade desc; |
Oracle |
select upper(substr(name,1,instr(name,' ')-1)), to_char(whenMade,'YYYY-MM-DD') from customer, ordered where ordered.customer=customer.customerID order by whenmade desc; |
There are also summary functions in SQL. They can also be used in specifying output data. Summary functions are discussed in the section on summary queries.
An element in the output row specification defines the contents of a result table column as well as the name of the column. If the element is a column name, the same name is given to the corresponding result column. If the element is something else, the naming principles vary according to the implementation. In Oracle, the name is the definition text of the element. In the example above, the result columns would thus be named as follows.
Names like this are difficult to use. To get better names the columns can be renamed.
format | element [AS] new_name |
example | select upper(substr(name,1,instr(name,' ')-1)) as firstname, to_char(whenMade,'YYYY-MM-DD') as lastdate from customer, ordered where ordered.customer=customer.customerID order by whenmade desc; |
Now the names of the columns would be
Renaming columns is necessary if there would otherwise be several columns with the same name.
The shorthand character * (asterisk) can be used when all the columns of an original table or a combination of tables should be included in the result table. Then, the columns in the result table will have the same names as in the original table. A qualifier can be added in front of the asterisk to determine from which table the columns are to be included. The following example fetches the complete product table.
example | select * from product; |
The where- part of a query is used for expressing the selection and join conditions. The conditions are expressed as a logical expression after the keyword 'where'. In this expression the atomic conditions can be connected with the logical operations 'AND' and 'OR'. Parentheses may be used in the expression to determine the evaluation order of the logical operations. If there are no parentheses, 'AND' is evaluated before 'OR'. A negation is defined using the operation 'NOT'.
There are four types of atomic conditions
For matching two values with each other, the comparison operations known from programming languages can be used; < (less than), > (greater than), = (equal to), => (greater than or equal to), <= (less than or equal to), and != or <> (unequal to). For the numerical data, the matching is based on the numeric value. For the character data, the matching is based either on character code values or on the national order attached to data types based on national character sets. The order of date and time data is determined chronologically. SQL is able to make some automatic type conversions in matching. A suitable character string, for example, can be changed into a date or number before matching the value with a date or a number.
The matching of a value with a set of values can be made in the following ways
A value set is presented by listing the values that belong in the set inside parentheses with commas between them.
how to present a value set | ( item1, item2, ...., itemN) |
A value set may also be defined with the help of a so-called sub-query. This technique is discussed in more detail in the section on sub-queries.
A match qualifier defines how the elements of the set must fulfil the given condition. Qualifier SOME presupposes that at least one of the elements fulfils the condition. The qualifiers ALL presupposes that all the elements fulfil it.
Assume that the column TYPE contains the values 1, 2, 3, 4, 5 and 6. The following conditions are true if the type value is 3, 4 or 5.
examples | type between 3 and 5 type in (3,4,5) type not in (1,2,6) type = some (3,4,5) type <> all (1,2,6) |
Mask matching can be used only with character strings. In a mask, the percent character (%) substitutes any character string, and an underline character (_) substitutes any single character.
examples | name like 'Mi__' Names with 4 characters that start with 'Mi', for example Mike name like 'V%nt' Names starting with 'V' and ending with 'nt' for example Vincent |
The existence of a value means that there is some other value than null (NULL) in a column. The existence is tested by the condition expression 'column is not NULL' and the occurrence of an empty value accordingly 'column is NULL'. In the following example, data is queried about customers who don't have an e-mail address.
example | select * from customer where email is NULL |
In SQL comparing a null value with whatever value results in truth value UNKNOWN..
An arranged result is usually easier for the user to comprehend than an unarranged one. It also suits better for information retrieval. In SQL, the order of the database tables is unspecified, but the output may be ordered. The order of output may be specified using similar expressions that are in specifying the output data. The ordering is specified by including the order specification (ORDER BY -part) into the query.
The order is defined by listing the elements that specify the order. The first item in the list is the primary ordering criterion, the second item the secondary criterion, and so on. The secondary criterion specifies the order among rows that have the same position according to the primary criterion.
The order can be specified as either ascending (asc, default value) or descending (desc). In the following example, data is queried on customers who live in areas A01, A02 or A03. The result is arranged primarily by the area, secondarily by the street address and thirdly by customer's name;
example | select * from customer where areacode in ('A01','A02','A03') order by areacode, address, name |
When we look at the answer of this query its quite hard to see that the result is arranged. However if we organise the columns differently as in the query below, the order of rows is obvious.
example | select areacode, address, name, customerid,phone from customer where areacode in ('A01','A02','A03') order by areacode, address, name |
Using order specification is the only way to ensure the order of the output rows, and to ensure that the query works the same way after a version update of the database management system. Some query elements, like the use of keyword DISTINCT or the use of grouping, typically cause the ordering of output rows in most database management systems. However, there are some cases when, for example finding distinct values may be done more efficiently without arranging the rows and some query optimizer might well decide to do so.
Query output tables can be used as the parties in set theoretic operations. SQL provides the operations UNION, INTERSECT and EXCEPT (in Oracle the EXCEPT operation is called MINUS). As in relational algebra, the parties of these operations must have the same structure. They need not, however, have the same column names. The column names of the result table are determined by the first party. Several sub-queries that are combined by set theoretic operations can be gathered within a single query. Parentheses can be used to specify the order of computations. Queries combined with set theoretic operations can contain all the query parts, except the ORDER BY part. There can be only one ORDER BY part in the query, and it specifies the order of the combined output..
The UNION operation of SQL eliminates duplicate rows from the result. Thus it differs from the normal SQL principle of retaining the duplicates.
The following example query finds out the identifiers of such orders that have not been delivered, yet.
example | select OrderID from ordered where orderid not in (select orderid from delivery) union select orderid from delivery where whendelivered is null; |
You may try both subqueries separaterly.