Simple queries

Contents
Introduction
      Duplicate rows and their elimination
The elements of output row specifications and conditions
      Column names
      Literals
Arithmetic expressions
      String expressions
      Scalar functions
Features of output row specifications
      Naming the columns
      All columns at the same time
The condition (where) part of the query
      Matching two values with each other
      Matching a value to a set of values
      Mask matching
      Existence of a value
Arranging the Result Table
Set theoretic operations
Reading material

Elmasri&Navathe: Chapter 8.2 (pages 251-261)

Introduction

In simple  queries, one output row is formed for each row  (or combined row)  that fulfils the conditions given in the WHERE-part of the query. The output row contains the values of those columns that are listed in the result row specification (see figure below).

If the query uses only one table, the selection of rows according to the WHERE-conditions equals to the selection operation in the relational algebra. If the output rows are specified by listing some columns of the target table, the output of the SQL-query resembles the result of the projection operation in the relational algebra. The difference is that duplicate rows are not automatically discarded form the answer of an SQL query, as it is in a projection. Thus, a similar row can appear many times in the result table.

Duplicate rows and their elimination

The following query

select AreaCode from Customer; 

gives each areacode as many times as there are customers registered to locate in that area. Supposed there are 5 customers in area A01 and 7 customers in area A02, we would get the value A01  five times and the value A02  seven times.

Although SQL does not, by default, eliminate duplicate rows, it is possible to force it to do so, and thus to obtain a behaviour similar to the projection of the relational algebra. To do this we should include the keyword  DISTINCT as the first element of the output row specification.

The above query when modified to

select distinct AreaCode from customer;

gives each areacode only once.

To consider:
Why does SQL not discard duplicate rows automatically?
See Explanation

 

The elements of output row specifications and conditions

The same basic elements can be used in specifying output rows and in the conditions of  the WHERE-part of a query. These include:

 

All these elements may also be used as operands in the conditions of   the where-part of the query.

 

Column names

The format of the column name does not matter. A qualifier can be attached in front of a column name, to indicate  which table the column belongs to. A full column reference has the following format:

syntax [[[database_name.]schema_name.]table_name.]column_name
example test_db.pizza.customer.customerID

product.productID

Constant values

How to express constant values (literals) depends on the type of the constant.  The matter was discussed in the section on basic data types. 

Strings are enclosed in single quotation marks, for example 'THIS IS A STRING'

Numbers are represented without quotation marks, for example 2002. 

Date and time representations vary according to the database management system, but ISO Standard date format YYYY-MM-DD is most common. In the following examples we, however, use Oracle default date format DD-MON-YYYY (for example 24-FEB-2001) because our example database runs on Oracle. (unless we change the dbms or we find out how to force Oracle to use ISO-date). Dates are enclosed in single quotation marks, for example '24-FEB-2001'.

 Null values are referred as NULL (need not be capital).

 

Arithmetic expressions

The arithmetic expressions of SQL are correspond to the ones in programming languages in general. The operations in use are: + (addition), - (subtraction), * (multiplication), and / (division). Parenthesis may be used to indicate the order of computation.

example 2.5*(lenght-100)/weight

Dates and times can be added and subtracted. Subtraction of one date from another gives the number of days between these dates as an interval. Subtracting a timestamp from another gives the interval between the timestamps. Intervals may be added to dates, times and timestamps. In the following example a 10 day interval is added to the date '12-APR-2001' and a 12 interval (0.5 days)  is added to the timestamp '12-APR-2001 8.45'. These are expressed in Oracle SQL syntax.  

examples '12-APR-2001' + 10    = '22-APR-2000'

'12-APR-2001 8.45' + 0.5 = '12-APR-2001 20.45'

If some of the columns used in an arithmetic expression has the value NULL, the entire expression will have the value NULL. 

 

String expressions

Character strings may be concatenated. The symbol for this operation is ||. In the following example, the value of the column FirstNames is concatenated after the value of the column Surname, comma and a space character.

example Surname || ', ' || FirstNames

The treatment of null values in concatenation varies. They may be treated as they are treated in arithmetic expressions, i.e. the result is null if at least one of the operands is null. Oracle treats null values in concatenation differently. When a null valueis concatenated with some real value the result is the real value, only the concatenation of two null values results in a null value.

Standard  SQL allows the concatenation also between two  binary elements. 

Scalar functions

SQL offers a variety of scalar functions to process values. The number of  functions offered by the SQL standard has been and is still quite restricted, though it is increasing. In the standard of 1986, for example, there were no scalar functions. This is why the producers of database management systems have implemented  functions that are somewhat different from each other. Though new functions have been added to the standard, since, the producers (e.g. Oracle) have not changed their products to accord with the standard. This is why the use of scalar functions nearly always brings with it portability problems. 

In the table below, some of the main functions are presented in the standard as well as Oracle's format. Column names, literals and expressions can be used as parameters in these functions. There are many more functions in Oracle than in the standard.

Standard Oracle Meaning
abs(x) abs(x) absolute value of x
mod(x,y) mod(x,y) modulus of the division x/y
length(m) length(m) length of the string
position( p in m) instr(m,p) the position of string p in string m. Note that in SQL the first character has the position index 1.
lower(m) lower(m) string m as lowercase
upper(m) upper(m) string m as uppercase
substring( m from b for s) substr(m,b,s) a substring of m starting at position b and having the length s
cast(x as type) to_char(x, format)
to_number(x,format)
to_date(x,format)
x as another data type
case x
when y1 then z1 ...
[else e]
end
decode(x,y1,z1,... e) value of x selects the resulting value
nullif(x,y) decode(x,y,NULL,x) if x=y then NULL otherwise x
coalesce(x,y) nvl(x,y) if x is null then y otherwise x
extract(fn from t) to_char(t, format) the element fn of timevalue
current_date
current_time
current_timestamp
sysdate

 

now

In the following example, customer's first name and the dates of her orders are requested using both the standard and Oracle SQL. Customer's first name is capitalized and the date is expressed according to ISO-8601 standard

 

Standard
select upper(substring(name from 1 for position(' ',name)-1)),
  cast(extract(YEAR,whenMade),VARCHAR)||'-'||
  cast(extract(MONTH,whenMade),VARCHAR)||'-'||
  cast(extract(DAY,whenMade),VARCHAR)||
from customer, ordered
where ordered.customer=customer.customerID
order by whenmade desc;
Oracle
select upper(substr(name,1,instr(name,' ')-1)), 
  to_char(whenMade,'YYYY-MM-DD')
from customer, ordered
where ordered.customer=customer.customerID
order by whenmade desc;

There are also summary functions  in SQL. They can also be used in specifying output data. Summary functions are discussed in the section on summary queries. 

Features of output row specifications

Naming the columns

An element in the output row specification defines the contents of a result table column as well as the name of the column. If the element is a column name, the same name is given to the corresponding result column. If the element is something else, the naming principles vary according to the implementation. In Oracle, the name is the definition text of the element. In the example above, the result columns would thus be named as follows.

Names like this are difficult to use. To get better names the columns can be renamed.

format element [AS] new_name
example
select upper(substr(name,1,instr(name,' ')-1)) as firstname, 
  to_char(whenMade,'YYYY-MM-DD') as lastdate
from customer, ordered
where ordered.customer=customer.customerID
order by whenmade desc;

Now the names of the columns would be

Renaming columns is necessary if there would otherwise be several columns with the same name.

All columns at the same time

The shorthand character * (asterisk) can be used when all the columns of an original table or a combination of tables should be included in the result table. Then, the columns in the result table will have the same names as in the original table. A qualifier can be added in front of the asterisk to determine from which table the columns are to be included. The following example fetches the complete product table.

example select * from product;

  ****

The condition (where) part of the query

The where- part of a query is used for expressing the selection and join conditions. The conditions are expressed as a logical expression after the keyword 'where'. In this expression the atomic conditions can be connected with the logical operations 'AND' and 'OR'. Parentheses may be used in the expression to determine the evaluation order of the logical operations. If there are no parentheses, 'AND' is evaluated before 'OR'. A negation is defined using the operation 'NOT'.

There are four types of atomic conditions

Matching two single values with each other

For matching two values with each other, the comparison operations known from programming languages can be used; < (less than), > (greater than), = (equal to), => (greater than or equal to), <= (less than or equal to), and != or <> (unequal to). For the numerical data, the matching is based on the numeric value. For the character data, the matching is based either on character code values or on the national order attached to data types based on national character sets. The order of date and time data is determined chronologically. SQL is able to make some automatic type conversions in matching. A suitable character string, for example, can be changed into a date or number before matching the value with a date or a number.

  ****

Matching a single value with a set of values

The matching of a value with a set of values can be made in the following ways

A value set is presented by listing the values that belong in the set inside parentheses with commas between them.

how to present a value set   ( item1, item2, ...., itemN)

A value set may also be defined with the help of a so-called sub-query. This technique is discussed in more detail in the section on sub-queries.

A match qualifier defines how the elements of the set must fulfil the given condition. Qualifier SOME presupposes that at least one of the elements fulfils the condition. The qualifiers ALL presupposes that all the elements fulfil it.

Assume that the column TYPE contains the values 1, 2, 3, 4, 5 and 6. The following conditions are true if the type value is 3, 4 or 5.

examples
type between 3 and 5
type in (3,4,5)
type not in (1,2,6)
type = some (3,4,5)
type <> all (1,2,6)

****

Mask matching

Mask matching can be used only with character strings. In a mask, the percent character (%) substitutes any character string, and an underline character (_) substitutes any single character.

examples
name like 'Mi__'

  Names with 4 characters that start with 'Mi', 
  for example Mike 

name like 'V%nt'

  Names starting with 'V' and ending with 'nt'
  for example Vincent

****

Existence of a value

The existence of a value means that there is some other value than null (NULL) in a column. The existence is tested by the condition expression 'column is not NULL' and the occurrence of an empty value accordingly 'column is NULL'. In the following example, data is queried about customers who don't have an e-mail address.

example
select * from customer 
  where email is NULL

In SQL comparing a null value with whatever value results in truth value UNKNOWN..

  ****

Arranging the Result Table

An arranged result is usually easier for the user to comprehend than an unarranged one. It also suits better for information retrieval. In SQL, the order of the database tables is unspecified, but the output may be ordered. The order of output may be specified using similar expressions that are in specifying the output data. The ordering is specified by including the order specification (ORDER BY -part) into the query.

The order is defined by listing the elements that specify the order. The first item in the list is the primary ordering criterion, the second item the secondary criterion, and so on. The secondary criterion specifies the order among rows that have the same position according to the primary criterion.

The order can be specified as either ascending (asc, default value) or descending (desc). In the following example, data is queried on customers who live in areas A01, A02 or A03. The result is arranged primarily by the area, secondarily by the street address and thirdly by customer's name;

example
select *
from customer 
where areacode in ('A01','A02','A03') 
order by areacode, address, name

When we look at the answer of this query its quite hard to see that the result is arranged. However if we organise the columns differently as in the query below, the order of rows is obvious.

 

example
select areacode, address, name, customerid,phone
from customer 
where areacode in ('A01','A02','A03') 
order by areacode, address, name

Using order specification is the only way to ensure the order of the output rows, and to ensure that the query works the same way after a version update of the database management system. Some query elements, like the use of keyword DISTINCT or the use of grouping, typically cause the ordering of output rows in most database management systems.  However, there are some cases when, for example finding distinct values may be done more efficiently without arranging the rows and some query optimizer might well decide to do so.  

 

  ****

Set theoretic operations 

Query output tables can be used as the parties in set theoretic operations. SQL provides the operations UNION, INTERSECT and EXCEPT (in Oracle the EXCEPT operation is called MINUS). As in relational algebra, the parties of these operations must have the same structure. They need not, however, have the same column names.  The column names of the result table are determined by the first party. Several sub-queries that are combined by set theoretic operations can be gathered within a single query. Parentheses can be used to specify the order of computations. Queries combined with set theoretic operations can contain all the query parts, except the ORDER BY part. There can be only one ORDER BY part in the query, and it specifies the order of the combined output..

The UNION operation of SQL eliminates duplicate rows from the result. Thus it differs from the normal SQL principle of retaining the duplicates.

The following example query finds out the  identifiers of such orders that have not been delivered, yet.

example
select OrderID from ordered
  where orderid not in
     (select orderid from delivery)
union
  select orderid from delivery
  where whendelivered is null;
  

You may try  both subqueries separaterly.

 

  ****