Web applications

Harri Laine, 23.1.2007

Contents

 

Client and Server

A Web-application is a special case of a distributed information system. Its user interface is implemented as web pages. These pages are constructed in a server computer, delivered to the client computer over a network, and presented to the user using a web browser. Web pages may contain data from one or more servers and they are build up as a reply to a client request.

A web page consists of a skeleton and some supplementary material, for example images. The skeleton is typically expressed in HTML (Hypertext Markup Language) -language. The server sends the skeleton as a reply to a request. The skeleton contains information about the attached supplementary material. After the browser has received the skeleton it makes separate requests to fetch the sublementary material. They need not reside on the same server that built up the skeleton page.

Resource locator

Requests are expressed by the means of universal resource locators (URLs). A universal resource locator identifies

Below is an example of a universal resource locator.

http://www.cs.helsinki.fi/u/laine/tsoha/example.html

Browser capabilities

Without external plugins the browsers are typically able to:

Plugins make it possible to

 

Server and services

A server may be a multi-part constellation of programs and computers. A traditional www-server, for example Apache, is able to serve both requests for static files stored in the file system of the server computer and requests to run programs that compose pages dynamically. A server may also be a more versatile application server that, in addition to the above functions, may handle session control, load balancing, authentication control, and connections to various external data sources.

Dynamic web pages and server programs

A web-application may use static web pages. These are pages with fixed contents. They appear similar to all users. There might be simple applications solely based on static pages. However, if the contents of the pages should be adjusted according to the user and her actions, we need dynamic pages. Dynamic pages do not exist as ready made files in the server's file system. Instead, they are composed separately for each request.

CGI (Common gateway interface) is a traditional technique to produce dynamic web pages. This technique makes it possible for the web-server to launch programs in the server computer, to pass request parameters to the launched program and to direct the output of the program to the client that made the request. The program is launched as a separate process that dies when the execution of the program ends.

The program to be launched may be written in whatever programming language. CGI-interface specifies environment variables for passing request data to the program. These include

The launched program has access on all these data. The way the data are made available is specific to the programming language and to the operation environment. In web applications the launched program should build up a web page to be passed to the client. Otherwise, there are no limitations on what the program can or needs to do. It may use databases to store and retrieve data. It may communicate with other computers and use their services.

Parameters

Programs that construct dynamic web pages usually rely on parameters to specify what to do. Parameters may be delivered as cookies. HTML-forms may also be used to collect parameter values. Form data is delivered to the program either as the value of the environment variable QUERY_STRING (if the passing method is 'GET') or as a separate file acting as the standard input to the program (if passing method is 'POST'). Parameters may also be defined as part of the request URL. Such parameters are delivered in QUERY_STRING.

All techniques present the parameter collection as a string that consists of 'Name=Value' pairs. In cookies the pairs are separated by semicolon (;), otherwise the separator is ampersand (&). Both the name and the value are strings. Same name may be used in many pairs. The ways to access the parameters vary according to the programming language. Typically there is a function to access parameter values by the name of the parameter and another function to iterate the list.

The following URL specifies two parameters 'dept' and 'lang'. They are passed to the program 'streg' as the value of the environment variable QUERY_STRING.

<a href="http://ilmo.cs.helsinki.fi/streg?dept=T&lang=FI">Registration</a>

QUERY_STRING = "dept=T&lang=FI"

The developer of the service program decides what are the names of the parameters and how the values of the parameters are to be used. The values are delivered as a string. This string may, of course, contain also numerical data.

Parameters and HTML-forms

Parameter values may be delivered as part of the URL, as shown above. However, a more common way to pass parameters is to use HTML-forms. The server program to handle the form data is specified in the form header. Submitting the form causes a request to run this program. The elements of the form specify what parameters are attached to the request. The names of the elements are used as the names of the parameters and the values of the elements are used as the values of the parameters.

Let consider the form:

First:
Second:
Really?   ;Yes No
 

In HTML this is defined as:

<form name="exampleform" action="program" method="get">
<table><tr><td>First:</td><td><input type="text" name="first" size="20" value="Initial value 1"></td></tr> <tr><td>Second:</td><td> <input type="text" name="second" size="20" value=""></td></tr>
<tr><td>Really? &nbsp; </td><td>
<input type="radio" name="third" value="yes" checked>;Yes
<input type="radio" name="third" value="no">No</td></tr>
<tr><td>&nbsp;</td><td><input type="submit" name="submit" value="Submit"></td></tr>
</table>
</form>

This form has two text fields, two radio buttons and a submit button. If the submit button is clicked without touching the other elements, the program will receive as parameters the initial values of the elements (the values defined as value-attribute of the elements). The parameter string to be passed is

first=Initial+value+1&second=&third=yes&submit=Submit

The parameter 'second' has an emplty string as ist value.

Let's assume that the user has manipulated the form so that field first has the value "changed value 1" , field second has the value "edited" and choice "No" has been selected instead of default value "Yes". The parameter string to be passed would be

first=changed+value+1&second=edited&third=no&submit=Submit

As we can see on these examples space-characters are changed to plus characters (+). Also some other characters, for example i '&','<','>' and scandinavian characters would be changed and passed as 'URL encoded.' The programming language function to access the parameter values should take care of the decoding. Thus, if the value of the parameter 'first' is requested, for example, using the Java method HttpRequest.getParameter("first"), the obtained string value would be "changed value 1".

In the above example the submit button has a name, too. Naming is necessary, if the form has many submit buttons with different meanings. The value attached to the submit button is shown on the user interface will be passed as a parameter when the corresponding named button is pressed.

Commonly used elements in HTML forms are

The browser constructs the parameter string by collecting the values of the elements in the order the corresponding fields appear in the HTML-form. All programming interfaces do not, anyhow, provide means to access the order. If many items share the same name, their values may be provided as a string array and the connection between the value and its environment may be lost unless this connection is taken care by embedding the positioning information either in the names or in the values of the elements. The following example has elements that share the same name. The connection between products and the checkboxes is preserved by specifying the indicies of text elements as values of checkbox elements:

product ={"111","222","333","444"} and buy={"1","3"}

Product: Buy:
Product: Buy:
Product: Buy:
Product: Buy:

<form name="form1" method="post" action="">
<table width="75%" border="1" cellspacing="2" cellpadding="2">
<tr>
<td>Product:
<input type="text" name="textfield" value="111"></td>
<td>Buy:
<input type="checkbox" name="checkbox" value="1" checked></td>
</tr>
<tr><td>Product:
<input type="text" name="textfield" value="222"></td>
<td>Buy:
<input type="checkbox" name="checkbox" value="2"></td>
</tr>
<tr><td>Product:
<input type="text" name="textfield" value="333"></td>
<td>Buy:
<input type="checkbox" name="checkbox" value="3" checked></td>
</tr>
<tr>
<td>Product:
<input type="text" name="textfield" value="444"></td>
<td>Buy:
<input type="checkbox" name="checkbox" value="4"></td>
</tr>
</table>
</form>

Web-application as a sequence of forms

The functionality of a web-application is achieved as a sequence of web pages. The handler of the form in one page builds up the next page.

When using the CGI technique all the form handlers are run as separate processes that die when the program terminates. All the data held in the program's memory will be lost. Successive pages usually have common data, for example, data about the user and her actions. Application servers typically make it possible to keep some data alive in the main memory. But even then it is not possible just to continue the same program from the point it finished the processing when it composed the previous page. When a handler program is started it doesn't have any knowledge about any previous processing done by the server to serve the client. Such a knowledge must be passed to the server program as part of the activation request, i.e the knowlege is obtained from the client. This knowledge might be a session identifier that the program then uses to reconstruct the state of processing either based on the data in the database or on the data kept alive within the application server. Also bigger amounts of data may be circulated from one handler activation to another through the browser and the web pages. Techniques used in circulating data are

Benefits and drawbacks of these techniques are discussed in Gunter Ollmann's article: Web Based Session Management.