XQuery Java API (tutorial)

XQuest/server "Milestone 1"


This API provides the methods for compiling and executing XQuery/XPath scripts from Java and exploiting results. This is very similar to using the SQL language through a Java interface like JDBC.

Using XQuest's implementation of XQuery provides a high-level query language and extended processing capabilities. It is therefore advisable to implement the largest possible part of an application with XQuery, and use Java only for retrieving final results. This is especially true when connecting to a remote server, such an approach minimizing the network traffic.

The API is actually used in the GUI and Command-line Interface applications provided with XQuest, as well as in the "Server Pages" extension, which embeds the XQuery engine in a Servlet.

The API allows to:

For calling Java methods from within XQuery expressions, see the XQuery Extensions documentation, section Java Binding.

For dealing specifically with XML databases (XML Libraries), see the XML Library API.

XQueryConnection [interface]

This is the fundamental interface for applications. A XQueryConnection provides an environment for creating Expressions, which can then be executed. Notice that the word Connection does not necessarily imply a remote connection through a network. XQuest supports the RMI technology, so the application and the "server" can work in a distributed object environment, but of course the client code can also run in the same Java Virtual Machine as the XQuery engine.

A XQueryConnection is obtained from a XQueryServer, or from XQueryDataSource, which is an abstract connection factory.

Each application should have its own XQueryConnection. Access to the same connection by several threads must be explicitly synchronized.

XQueryServer [interface]

Abstract view of a XQuery Engine (embedded or remotely accessible). It is mainly a provider of connections through a method named getConnection() which can specify a particular XML Library (database).

XQueryBasicEngine [class]

An implementation of the XQuery engine (XQueryServer interface). This is a controller centralizing the management of resources (memory, XML Libraries, compiled XQuery modules, cache of parsed XML documents).

Configuring and starting a XQueryEngine is dependent on the context in which XQuest is used (standalone, J2EE). This topic is explained in a separate section below.

XQueryExpression

A XQueryExpression is the equivalent of a JDBC/SQL Statement. It is created from a Connection and receives a XQuery script.

Its purpose is to execute simple scripts. However values can be bound with XQuery variables, like for XQueryPreparedExpression. It can be reused for several different scripts.

XQueryPreparedExpression

A Prepared Expression is very similar to a XQueryExpression, but it is used to execute repetitively the same XQuery script with different settings of variables. Global XQuery variables can be assigned a value through this interface.

XQResultSequence

This is the result of the execution of an Expression. A XQuery expression returns a value which is sequence of items (XQItem). An Item can have a simple value (string, number etc.) or be a XML Node (a node of the XML Data Model).

XQResultSequence appears as an iterator which enumerates the items of the sequence. It provides methods to obtain and test the type and value of each item.

XQItem

This is an abstract interface which provides methods to obtain and test the type and the value of the item.

XQType and XQItemType

A representation of XQuery types. XQType is the most general: it can describe Item types (XQItemType) or sequence types. The type of a XQValue is generally XQType, while the type of a XQItem is always a XQItemType.

XQNode

A specialization of XQItem representing XML nodes. It provides access to the XML Data Model. Data Model interfaces are presented in a separate section below.

This section is an introduction to the use of the API, in the form of a tutorial. Reference material is available as Java documentation (Javadoc).

An XQuest application typically performs the following steps:

  1. Obtain a XQueryConnection. This is achieved by using the getConnection method on a XQueryServer or on a XQueryDataSource.

  2. Optionally define settings on the connection. Such settings will be inherited by all Expressions created from this connection. This includes arbitrary named properties, predefined namespaces, collations, global variable values, default XML input (document or collection), default serial output.

  3. Create and execute application-specific Expressions (XQueryExpression or XQueryPreparedExpression). Before executing an expression, it is possible to redefine the settings mentioned above, specifically for the expression. In particular, initial values can be bound to global variables of the expression.

  4. Executing an Expression (methods executeQuery) can be performed in different ways.

  1. Obtain a XQueryConnection. For simplicity, we assume we already hold a XQueryDataSource, which is a kind of factory providing a connection:

    XQueryDataSource dataSource = ...;
    ...
    XQueryConnection connection = dataSource.getConnection();
  2. Setting static options: there are quite a few possible settings:

  3. Compile a Query:

    there are different variants of method XQueryConnection.compileQuery. Basically it needs a piece of text (a CharSequence, i.e. typically a String) which can also be read from a stream or a File.

    An URI must be specified for use by error message and traces. For a file or URL input this would typically be the string value of the path or the URL.

    String querySource = 
              " for $i in 1 to 3 return element E { attribute A { $i } } ";
    try {
       XQuery query = connection.compileQuery(querySource, "<source>", log);
       ...
    } catch( XQueryException e) {
       ...
    }

    Exceptions can be raised on a syntax error (prevents further compilation) or by static analysis errors (at end of compilation).

  4. Setting run-time options:

    Typically, global variables (declared external in queries) can be initialized here. Initial values specified in queries can also be overridden. The method initGlobal has different variants, according to the value passed. An exception is raised if the value does not match the declared type.

    Initial values are part of the execution environment and do not affect compiled Queries which can be shared by several threads.

    Other options: default output for function x:serialize, node or node sequence used for XQuery function input(), implicit timezone, message log.

  5. Executing a compiled query:

    There are several ways to obtain results:

  6. Handle errors: execution can raise an EvalException. The message of the exception gives the reason for the error. It is also possible to display the call trace:

    try {
        Value v = connection.executeQuery( query );
        ...
    } catch (EvalException ee) {
        ee.printStack(log, 20);
    }

    The stack trace is printed to a Log object. The second argument gives a depth maximum for the trace (0 means no maximum).

This section describes the Java interfaces to the XML/XQuery Data Model. The Data Model is defined by a W3C specification: http://www.w3.org/TR/xpath-datamodel/. It is an extension of the XML Infoset which describes precisely the abstract objects (their contents, possible values, and relationship) which constitute XML Documents handled by XPath 2, XQuery and XSLT 2.

This Data Model differs from the W3C DOM in the following respects:

  • It supports XML Schema types and the notion of collections.

  • It does not keep track of physical features like entity boundaries, marked sections, characters references.

  • It does not define updating operations.

  • No language bindings are specified.

In XQuest the XML Data Model is seen mainly through the Node interface (net.axyana.xquest.dm.Node). It supports the accessors defined in the Data Model specifications plus extensions. See the XML Library Java API for more details.

There is also a XQuery version of Node, which is XQNode: it provides both the Node and the XQItem interfaces.

The net.axyana.xquest.dm package also contains few related interfaces or classes, like NodeSequence, NodeTest, and service classes like XMLSerializer and FulltextQuery.

The utility package net.axyana.xquest.util contains ancillary classes for handling qualified names (QName and Namespace).

What follows is a short primer. For detailed information, refer to the Java Documentation.

Basic information access:

Here are the basic accessors:

String getNodekind()

represents the accessor dm:node-kind() which returns string values like "document", "element", "attribute" etc.

int getNature()

returns the node kinds as integer values, more convenient for programming, like DOCUMENT, ELEMENT, ATTRIBUTE, TEXT, COMMENT, PROCESSING_INSTRUCTION, and NAMESPACE (all constant fields of the Node interface).

QName getNodeName()

represents the accessor dm:node-name() which returns a qualified name if applicable (elements, attributes) or the null value.

Node parent()

Returns the parent node or null.

String getStringValue()

Returns the textual contents of the node, as defined in the DM specifications (The string value of an element is the concatenation of all text fragments encompassed by the element).

NodeSequence children()

For documents and elements, returns a NodeSequence, an abstract iiterator which can enumerate the children nodes in document order. To iterate on children, the following code pattern is typically used:

NodeSequence children = node.children();
while(children.next()) {
    Node child = children.currentNode();
    //...
}

For other node kinds, the sequence is always empty.

NodeSequence attributes()

This method returns the sequence of attribute nodes belonging to an element. Example: a crude serialization of an element:

if( node.getNature() == Node.ELEMENT ) {
   output.print("<");
   // print element name: needs to convert QName to string
   output.printName(node.getNodeName()); 

   NodeSequence attributes = node.attributes();
   for( ; attributes.next(); ) {
       Node attr = attributes.next.currentNode();
       output.print(" ");
       // print attribute name: needs to convert QName to string
       output.printName(attr.getNodeName());
       output.print("='");
       // print attribute value (needs escaping)
       output.printName(attr.getStringValue());
       output.print('");
   }
   output.print(">");
}
Extended accessors:

XQuest has extended methods which return sequences filtered by an abstract NodeTest.

BaseNodeTest is a most useful implementation of NodeTest which can filter nodes according to their kind and their name. It can also perform wildcard name matching. Its a convenience subclasses ElementTest and AttributeTest.

NodeSequence children( NodeTest test )

Returns the sequence of children which pass the test. For example, this code returns an iterator on children which have the name "section", with a blank namespace:

node.children( new ElementTest("section") )

which can also be written less simply as:

node.children( new BaseNodeTest( Node.ELEMENT, 
                                 Namespace.NONE, "section") )
NodeSequence attributes( NodeTest test )

Returns the sequence of attributes which pass the test. For example, this code returns an iterator on all attributes which have a name with namespace ns:

node.children( new AttributeTest( ns, null ) )
NodeSequence ancestors( NodeTest test )

NodeSequence ancestorsOrSelf( NodeTest test )

NodeSequence descendants( NodeTest test )

NodeSequence descendantsOrSelf( NodeTest test )

NodeSequence followingSiblings( NodeTest test )

NodeSequence following( NodeTest test )

NodeSequence precedingSiblings( NodeTest test )

NodeSequence preceding( NodeTest test )

Similar filtered iterators which implement XPath axes like ancestor, descendant etc.

Extended XQuery accessors:

The XQuery node (net.axyana.xquest.xquery.dm.XQNode) has slightly different methods which also return sequences filtered by an abstract NodeTest. The returned sequence is of type XQValue, the general XQuery result sequence.

XQValue getChildren( NodeTest test )

is equivalent to children(test)

XQValue getAttributes( NodeTest test )

etc.

Comparisons

There are methods for comparing the value or document order of two nodes:

int orderCompare( Node otherNode )

Returns -1 if this node is strictly before the other node in document order, 0 if nodes are identical, 1 if after the argument node.

This method is generally very efficient.

int compareStringValues(Node node, java.text.Collator collator)

compares the string values of two nodes, whatever their kinds, with an optional Collator.

Parsing

To obtain a Node from a document residing in a file or accessible through an URL, one can use the services of a DocumentParser or a DocumentManager.

  • DocumentParser provides basic parsing and tree construction services. It supports XML catalogs.

  • DocumentManager is an extension of DocumentParser which supports URI resolution, and caching (so that a document accessed several times needs not be reparsed). It can be used concurrently by several threads.

The simplest way of parsing a document given its URI (system Identifier in SAX terminology) is to use a static method of DocumentParser:

Node root = DocumentParser.parse(new InputSource(uri));

To use document caching, a DocumentManager has to be instantiated, then its findDocumentNode method can be used to get the root node of the document from its URI.

Serialization

XMLSerializer is a class which supports all serialization tasks. It converts any node into a serialized form in XML, XHTML or HTML (if applicable) or plain text (discarding the tags).

After creating a XMLSerializer, options can be set, in particular an output stream:

XMLSerializer serial = new XMLSerializer("HTML");
FileOutputStream outputStream = new FileOutputStream("out.html");
serial.setOutput(outputStream, "ISO8859_1");
serial.setOption(XMLSerializer.OMIT_XML_DECLARATION, "yes");
serial.setOption(XMLSerializer.INDENT, "no");

A node can be serialized this way:

serial.output(node);

A Serializer can be reused. The XML or DOCTYPE declarations are output only if the node is a document node. It is also possible to control this at a lower level by using methods reset, terminate, startDocument, endDocument.

Serialization options are described in the Java documentation and in the User's Guide.

Tree building

Although it has no tree modification methods, the XQuest data model packages provide a class to build trees: EventDrivenBuilder.

This class works in a SAX-like way: an instance receives events like startElement, attribute, endElement, text, and builds the tree in main memory incrementally on each event. Finally the created tree can be retrieved with the method harvest().

Though not very intuitive, this approach can be powerful for transforming a tree, by combining source tree traversal with construction.

QName DOC = QName.get("doc");
QName CHILD = QName.get("child");
EventDrivenBuilder eb = new EventDrivenBuilder();
eb.evStartElement(DOC);
eb.evAttribute(QName.get("id"), "x0001");
eb.evStartElement(CHILD);
eb.evText("some text");
eb.evEndElement(CHILD);
eb.evComment(" a comment ");
eb.evEndElement(DOC);
XQNode result = eb.harvest();

This snippet would create the following XML tree:

<doc id="x0001"><child>some text</child><!-- a comment --></doc>

There is a convenience method copy which recursively copies any other node and its subtree:

eb.copy(node);