Jena Tutorial

DAML+OIL - Ontology Description

Jeremy Carroll

April 2002

DAML API Introduction

DAML+OIL is a means for describing the vocabulary used within some RDF. For example, in the vCards we have already seen we have used <vCard:FN> and <vCard:EMAIL>. But what are these supposed to mean? How should you use them? What is the relationship between them? The job of the DAML layer is to provide in-depth formal answers to such questions.

We have seen that this vocabulary was introduced using two descriptions in English (the vCard in RDF W3C note and the original vCard RFC). There is also a more formal description, given using an RDF Schema (local copy). The DAML+OIL description of the same vocabulary can be thought of as extending this schema, or as providing an alternative, deeper schema.

Within Jena, we use "DAML" where we should, more properly say "DAML+OIL". DAML+OIL is the language component of DAML, which as a project had a number of other important deliverables. More information about DAML and DAML+OIL can be found at:

None of these are prerequisites for this tutorial. Students of this tutorial will be most interested in the walk-thru; but even that is quite challenging for people unfamiliar with the underlying concepts. Some students may find it easier to read this tutorial before stuggling with the more advanced description logic concepts needed for the walk-thru. Some students may prefer studying both the DAML+OIL walk-thru and this tutorial at the same time, perhaps scanning both first and then stepping through them in more detail, switching from one to the other every so often. The two are very different and have different objectives. In particularly the DAML+OIL walk-thru covers more of the language than this tutorial; whereas this tutorial concentrates on the use of DAML within Java, rather than the XML seralization.

Contents

  1. DAML API Introduction
  2. Getting Started
    1. Creating an Ontology
    2. Writing your Ontology as RDF/XML and N-triple
    3. Using Accessors to Modify the Ontology's Properties
    4. Adding Properties to the Ontology
    5. Datatype Range Constraints
    6. Adding Classes to the Ontology
    7. Adding Domain Constraints
  3. Reading and Navigating an Ontology
    1. DAML+OIL and XML
    2. Listing the Classes in an Ontology
    3. Listing the Properties in an Ontology
    4. Framing Properties
    5. Unique and Unambiguous Properties
  4. Instances
    1. T-Boxes and A-Boxes
    2. Loading Examples
    3. Listing the vCards
    4. Finding Fred's vCard
    5. Mixing Jena Model and DAMLModel APIs
  5. Advanced DAML
    1. Equivalent Names
    2. The Name of Our vCard Ontology
    3. Using XML Base
    4. Restrictions, Enumerations, Unions
  6. The Ontology Layer in Jena 2
    1. Difficult Improvements for Jena 1
    2. Architectural Changes
    3. Rollout Plan

Getting Started

After this section you will be able to:

This section includes a sequence of exercises that should be done in order. The solution for each exercise is a Java source file, and the easiest approach is to add a few lines to the answer of the previous exercise. If an exercise is either too easy or too hard for you, refer to my solution before moving to the next subsection.

It is important to grasp the Accessor design pattern used extensively within the DAML API. This is highlighted every so often in this section of the tutorial.

Creating an Ontology

The DAML API is found in the package com.hp.hpl.jena.daml. The first class to note is DAMLModel. All parts of an ontology get created within one of these.

So the first code we write is:

    DAMLModel model = new DAMLModelImpl();
Exercise Create a new main class that includes and executes the above line of code. (crib (java) (text) )

This model is a Jena model, with some additional functionality to help you create DAML ontologies. Each DAML ontology is encoded as RDF triples that can be stored within a Jena model. We will create an ontology using the DAML API. All of the operations of the DAML API get translated into adding, deleting and navigating the triples of this model.

Browse through the Java doc for DAMLModel; find an appropriate method for creating a new ontology.

Reading the documentation for this method we see that we may choose a name (a URI) for our ontology. For now, we won't choose a name.

Exercise Modify your code to create an ontology, by using the method you have found. (crib (java) (text))

Writing your Ontology as RDF/XML and N-triple

Having created an ontology the next step is to write it out.

Remember that the DAMLModel is a Jena Model, and the ontology that you have created is stored within the DAMLModel. If necessary refer back to the first section of the tutorial to remember how to write the model out as XML.

Exercise Modify your code to write out the ontology as XML. (crib (java) (text))

The XML produced should look something like:

<rdf:RDF
  xmlns:rdf='http://www.w3.org/1999/02/22-rdf-syntax-ns#'
 >
  <rdf:Description rdf:about='#A0'>
    <rdf:type rdf:resource='http://www.daml.org/2001/03/daml+oil#Ontology'/>
  </rdf:Description>
</rdf:RDF>

That's a bit of a mess, maybe we should try the Jena pretty writer.

Exercise Modify your code to use the pretty writer (use "RDF/XML-ABBREV"). (crib (java) (text))

That's better, I am now getting:

<?xml version='1.0'?>
<rdf:RDF
    xmlns:daml='http://www.daml.org/2001/03/daml+oil#'
    xmlns:rdf='http://www.w3.org/1999/02/22-rdf-syntax-ns#'>
    <daml:Ontology/>
</rdf:RDF>

We have said that the ontology is represented within the RDF model as triples, but what triples are there? The N-triple format is the easiest for understanding precisely which triples have been used. Jena allows us to use this format using the "N-TRIPLE" writer, briefly mentioned in the first section.

Exercise Modify your code to also write out the ontology as N-triple. (crib (java) (text))

There is one triple so far, and it is:

_:A <rdf:type> <daml:Ontology> .

(Note: the official N-triple format, output by Jena, does not use namespace prefixes, but writes very long lines).

Using Accessors to Modify the Ontology's Properties

This subsection introduces the Accessor design pattern. It is important to understand this pattern, and it is worth spending sufficient time on this simple example of its use.

Looking at the DAML walkthru we see that an ontology has a couple of properties: daml:version and rdfs:comment.

The corresponding methods in Jena are found in DAMLOntology and DAMLCommon. DAMLCommon is used as a place to put methods that are useful in more than one of the DAML interface definitions. Many things can have rdfs:comments so the prop_comment() method is common. daml:version only applies to DAML ontologies, and so it is not common.

Both these methods return a LiteralAccessor. This reflects one of the design patterns used in the DAML API. Many of the properties in a DAML Ontology can have zero, one or more values. Natural operations including adding, removing, setting, getting and listing one or more values. Rather than duplicate this list of operations for each property within the API, the user of the API is required to chain two calls together e.g.

    onto.prop_comment().addValue("This is another comment");
Exercise Modify your code to include a version number and two comments on the ontology. (crib (java) (text))
Exercise Generate output RDF/XML and N-Triple for your modified ontology. Use both the RDF/XML and the RDF/XML-ABBREV writers. (Repeat this exercise after any of the following exercises). (crib (java) (text))

Adding Properties to the Ontology

Most of the vCard specification is a list of properties. e.g. <vCard:FN> the full name property.

We look at the JavaDoc for DAMLModel to find how to create a property. Unfortunately we find three different methods, all of which look promising:

The difference between these is as follows:

createDAMLObjectProperty
This is used for creating properties where the value is another resource, represented by another node in the RDF graph. That resource in turn may well have further properties.
createDAMLDatatypeProperty
This is used for creating properties where the value is a simple value, and we know its datatype (e.g. string or integer). It is not possible to specify further properties for such a value.
createDAMLProperty
This is the dustbin case. We use this when neither of the other two are useable. Examples are if we know that something is a simple value but are unsure of its datatype, or if we wish to permit both simple values and object values. (This last possibility is not compliant with the DAML+OIL specification).

We see that the <vCard:FN> property takes a value that is always a string. Within DAML+OIL XML Schema Datatypes are used for describing datatypes. Within the Jena DAML subsystem the supported datatypes are:

Advanced: see XMLDatatypeRegistry for more information. In particular, it is possible to add more datatypes.

Exercise Add the property <vCard:FN> to your ontology by modifying your code. Do not yet include the information that the full name is a string. Remember the full URI for the vCard prefix is "http://www.w3.org/2001/vcard-rdf/3.0#". (crib (java) (text))

Datatype Range Constraints

In mathematical language a relationship or property relates two sets: the first is called its domain; and the second is called its range.

<vCard:FN> relates vCards (the domain) to strings (the range). We don't yet have a URI for describing vCards, so we will just concentrate on the range.

Within the DAML API datatypes are represented by a DAMLDatatype. These too are created using the relevant method on DAMLModel.

To set the range of a property or datatype property we use the method prop_range(). This returns a PropertyAccessor which uses the same design pattern as the LiteralAccessor that we used for accessing the daml:version and rdfs:comment properties. Be sure to understand this design pattern before moving on.

Thus to set the range of <vCard:FN> we need to create the DAMLDatatype corresponding to xsd:string and then add that to the range of the property.

Exercise Add the relevant range constraint to the property <vCard:FN> by modifying your code. (crib (java) (text))
Optional Exercise If you want the practice, add the following fields: with the appropriate range constraints. (Note: this is diverging somewhat from the vCard specification, particularly section 3.2). This is a good time to refactor your code; it might be getting a little scrappy. (crib (java) (text))
Exercise Modify that part of your code that uses the RDF/XML-ABBREV writer to use the base URL of "http://www.w3.org/2001/vcard-rdf/3.0". Is the output prettier? (crib (java) (text))

Hint Use the base argument of the write(writer,"RDF/XML-ABBREV",base) method.

The exercise above produces an RDF/XML file that is intended to be found at the web at the URL "http://www.w3.org/2001/vcard-rdf/3.0". Later we will learn about using XML Base which allows us to overcome this limitation.

Adding Classes to the Ontology

Some of the properties in the vCard specification have 'type parameters', see section 3.3 "Properties with Attributes". In particular, consider <vCard:TEL>, <vCard:EMAIL>, <vCard:ADR> and <vCard:LABEL>. (The inclusion of <vCard:TZ> in this part of the vCard specification appears to be a mistake).

Each of these is modelled in the schema using an RDF property e.g. (vcard:TEL) a list of RDF classes e.g. (vcard:home, vcard:work, etc), and a superclass for those classes, e.g. vcard:TELTYPES.

To create a class from the DAML API we look at the DAMLModel javadoc and find the appropriate method. To create a class and a subclasses, we first create two classes, and then use the prop_subClassOf() method on the subclass, and add the superclass. Once again we are using the accessor design pattern.

Exercise Add code creating the TELTYPES, home, work, pref classes, with appropriate subclass relationship between them. Add code for the TEL object property, and constrain its range to be a TELTYPES (or one of its subclasses). (crib (java) (text))

Adding Domain Constraints

The vCard properties such as <vCard:FN> are only meant to apply to vCards. Within DAML (and RDFS) the natural way to say this is to use a domain constraint. The domain constraint needs a class of vCards. However the vCard specification doesn't introduce such a class.

We will ignore the spec., and create such a class anyway. We can call it <vCard:VCARD> i.e. with URI "http://www.w3.org/2001/vcard-rdf/3.0#VCARD". Then we can specify the appropriate domain constraint on all our properties. We do this using the prop_domain() method.

Exercise Add code creating the <vCard:VCARD> and add domain constraints to the properties you are already creating. (crib (java) (text))
Advanced Exercise Add more code so that your VCard ontology covers all the properties and classes in the VCard RDF Schema. (crib (java) (text))

Reading and Navigating an Ontology

This part of the tutorial reverses the relationship between the RDF/XML file and the Java code. Instead of creating the ontology in Java and writing it out as XML, I have created an ontology in XML and we will use this by reading it into Java.

DAML+OIL and XML

Most users of DAML+OIL seem to be unhappily comfortable with the XML syntax used in the DAML+OIL specifications. The normal way of defining an ontology is to find a file on the web (particularly at the DAML ontology library) with an appropriate ontology already in it.

Loading an Ontology from a File

We use the version of the vCard ontology that I produced. It is here. Like in the last section, we create a DAMLModel and use it to load in the ontology.

The usual read methods are used to load an ontology, but a DAMLModel implements this in a special way to give the additional functionality.

Exercise Starting with a new Java file, with a main method, create a DAMLModel and read in the vCard ontology. (crib (java) (text))

Listing the Classes in an Ontology

Having loaded the ontology into Jena we can now explore it within our Java code. We will start by listing the classes in the ontology. listDAMLClasses() looks like a promising place to start! The iterator returned always gives a DAMLClass.

Exercise Change your code to print out the names of the classes in the vCard ontology. (crib (java) (text))

Notice:

Within RDF resources may be named or may be unnamed. Some of the classes in a DAML ontology are normally unnamed. This seems a bit strange, because in most class oriented languages class names are obligatory and tell us a lot about what the class is meant to be. In DAML, unnamed classes are ones which are defined using some operation within description logic. Within the ontology for vcards the class RDFAnon1 is defined as an intersection of other classes. Since intersection is conceptually straight forward, the lack of a meaningful name on this class is permitted (and might in some cases be helpful, the name can just get in the way). In the code that I used to generate this ontology this class really was unnamed, but when it got written out as RDF/XML it was forced to have a name. Sometimes RDF/XML allows resources to be unnamed, sometimes it needs them to have a name (the name the software chose was "RDFAnon1"). This is a significant known bug with the XML version of RDF. (We are considering having a Jena-specific solution of using not-quite-URIs of the form _:foobar for unnamed resources within RDF/XML).

The classes that are DAMLRestrictions corresponds to RDF/XML like:

   <daml:Restriction>
       <daml:onProperty rdf:resource='#TYPE'
           rdf:type='http://www.daml.org/2001/03/daml+oil#Property'/>
        <daml:toClass rdf:resource='http://www.w3.org/2000/10/XMLSchema#string'/>
   </daml:Restriction>

This too, is an operation from description logic. It means the set of things who are the subject of a triple with predicate vcard:TYPE, whose object is an xsd:string.

Within the ontology this restriction class is used (within an intersection which is used) as the range constraint of some other property (e.g. LOGO). What that means is that the value of the property LOGO always has a string-valued TYPE property.

This is a particularly baroque and awkward way to say something quite straightforward. Readers may be pleased to hear that early indications are that the Web Ontology Language that will replace DAML+OIL is likely to provide a more intuitive frame oriented way of saying this sort of thing.

Listing the Properties in an Ontology

Let's try listDAMLProperties().

Exercise Change your code to print out the names of the properties in the vCard ontology. (crib (java) (text))

Framing Properties

The last exercise should have produced a fairly muddled output.

Properties of vCards will have been muddled with properties of ADRPROPERTIES and properties of NPROPERTIES. So the property Family which only applies to NPROPERTIES is not clearly separated from FN (which also means Family Name) which applies to vCards.

DAML takes what is known as a description logic viewpoint that is property centric, rather than the more intuitive frame viewpoint that is class centric.

We can combine the previous two exercises, and use the method getDefinedProperties() to arrange the output in a frame like fashion.

Exercise Change your code to print out the properties listed by the class they apply to. (crib (java) (text))

Notice that this has not worked for the DAMLRestrictions. We saw earlier that RDFAnon1 is described making it clear that vCard:TYPE applies to it. But in the listing produced by the last exercise no properties are associated with any restriction.

This reflects limitations of the Jena support for DAML, in that we do not (yet) have a full description logic engine. Thus within Jena, it is hard to answer some interesting questions about an ontology (such as which properties must be present on this resource).

Unique and Unambiguous Properties

Properties in DAML can also have cardinality constraints. These indicate the maximum number of object values a property might have for some specific subject; or conversely the maximum number of subjects of a property correspond to one specific object.

Usually that maximum number is 1. This gives rise to unique properties which have at most one value on each subject; and unambiguous properties for which an object value corresponds to at most one subject.

The DAML API supports both unique and unambiguous properties, through the same design pattern of a set method ( setIsUnique(boolean), setIsUnambiguous(boolean)) and a get method ( isUnique(), isUnambiguous()).

Exercise Change your code to only print out unique and/or unambiguous properties listed by the class they apply to. (crib (java) (text))

Instances

So far, we have only talked about the ontology for vCards, we haven't actually seen a vCard in DAML. In this section, we use both the vCard ontology, and some sample vCards that conform to that ontology.

T-Boxes and A-Boxes

In most data and information frameworks the base data, and the information describing the base data are kept quite separate. A database schema is conceptually distinct from the database data. In the description logic world from which DAML+OIL emerged this separation was described using the word T-Box, the terminology box that lists the terms you can use and their interactions, and the word A-Box, the assertion box, the base facts described using the terminology from the T-Box.

In RDF, RDFS, and DAML+OIL, this clean separation between the base data and the schema or ontology is lost. From one point of view it is all muddled up together in a single soup. The Jena DAML API reflects that soup. Both the ontology terms and the instance data are loaded into the DAMLModel, and accessed primarily through the DAML API, (although both can also be accessed through the Jena Model API as well).

Loading Examples

Optional Exercise Create suitable instance data files for the rest of the exercises. Here is some code (java) (text) that can randomly generate such data.

We load RDF/XML instance data files, just like we loaded the ontology file above.

Exercise Starting with a new Java file, with a main method, create a DAMLModel and read in the vCard ontology, and the instance data. (crib (java) (text))

Listing the vCards

Having loaded all the data into our DAMLModel, we can now access it programmatically. Look at the DAMLModel javadoc and choose the appropriate method to get to the instances. When we find an instance, it should have Java class DAMLInstance. The iterator returned by DAMLModel.listDAMLInstances() also returns objects of Java class DAMLDataInstance. These are not interesting for this exercise and must be ignored.

To display the instance we will need to access a property, e.g. vCard:FN. To do that we use the method accessProperty(). This returns a PropertyAccessor which uses the now familiar accessor design pattern.

Since the ontology has told us that vCard:FN is a unique property, this means that there is at most one value. Thus we can use the method getDAMLValue() rather than the more general getAll(true). (The boolean argument to getAll concerns transitive properties that are outside the scope of this tutorial).

Using the explicitly typed sample data, the values returned by getDAMLValue() are DAMLDataInstances. To get the actual value to display, a further call to getValue() is necessary.

Exercise Modify your code to print out the name from each vCard in the knowledge base. (crib (java) (text))

It would be easier if we only looked at objects of the right class <vCard:VCARD>. This is possible; find an appropriate method in DAMLClass.

Exercise Modify your code to access the VCards through the class <vCard:VCARD>. (crib (java) (text))

Finding Daffney's vCard

A hint for the following exercises is that a quick way to get the property rdf:value is

     com.hp.hpl.mesa.rdf.jena.vocabulary.RDF.value

If we want to find a particular vCard, one approach is to use a brute force search through the knowledge base. Suppose we know someone's e-mail address and we want to find out more about them. We can go through the cards one by one, and access the vCard:EMAIL property (using getAll(true) since EMAIL is not a unique property). Each returned value will be a complex object (another DAMLInstance) and we need to access the rdf:value property to reach the actual e-mail address. The property rdf:value is not declared in our ontology, and is hence not treated as a DAML property. Using a DAML property accessor on rdf:value in this example does not work. Instead we must either make Jena calls or use getPropertyValue(). If any of the given e-mail addresses matches then we have found the one.

Exercise Modify your code to print out the name from the vCard for amanda_cartwright@example.org. (crib (java) (text))

Contrasting Jena Model and DAMLModel functionality

A hint for the following exercises is that a quick way to get the property rdf:type is

     com.hp.hpl.mesa.rdf.jena.vocabulary.RDF.type

Looking at DAMLCommon we find a method getRDFTypes. This can be used to find all the classes to which an object belongs.

Exercise Modify your code to select the vCard for amanda_cartwright@example.org, and then find all the types of the complex object (the DAMLInstance) whose rdf:value is amanda_cartwright@example.org. (crib (java) (text))

Notice the daml:Thing class in your answer, this is the top class in DAML.

Exercise Modify your code to select only the explicit types of the complex object. (crib (java) (text))

Notice that the DAML implementation does do a significant part of the reasoning to do with class membership. NB: the classes found are the result of following the isSubClassOf property, and are not all those classes that would be found with a description logic classifier.

With the closed flag set to false, this operates like the underlying Jena calls.

Exercise Modify your code to use the Jena Model API for accessing the rdf:type (e.g. use listProperties(rdf:type) ). (crib (java) (text))

Notice that when we do not want any additional reasoning the Jena API calls are just as useful as the DAML API calls.

In general modifying a DAMLModel through calls on the underlying Jena Model will lead to problems. We are currently exploring alternatives layered architectures for Jena that will allow us to support such multiple views over the same underlying data.

Advanced DAML

This section consists of leads into further study. This tutorial does not provide an adequate basis for understanding either these advanced DAML functionalities, or their (partial) implemenation within Jena.

Property Iterator

We saw in the last exercises that the DAML implementation can be smart about class membership. In general it can be smart about any property for which it knows the ontology. This functionality, as with class membership, is not a full description logic implementation, but does cover equivalence, transitivity, inverse properties, and the class and property hierarchies.

The javadoc can be found under the class PropertyIterator, and under many methods such as getEquivalentValues

Equivalent Names

DAML provides a number of ways of declaring that two different names (URIs) actually refer to the same resource. It also provides a way of showing that two names refer to different resources. These are:

  1. daml:equivalentTo
  2. daml:sameClassAs
  3. daml:samePropertyAs
  4. daml:sameIndividualAs
  5. daml:differentIndividualFrom

Jena provides some support for these within the DAML API. However, at least some of the relevant processing is expensive, and see setUseEquivalence(boolean). Moreover, there will be many implicit equivalences, that can be inferred from the ontology using a DL processor, that Jena cannot infer. These are hence ignored, whatever the setting of the useEquivalence flag.

The Name of Our vCard Ontology

Within the semantic web it is assumed that the owner of a web-server on foobar.example.org has first bite at defining the schema or ontology for all URLs starting http://foobar.example.org.

So we see that the vCard ontology was published by the W3C and they allowed the author to use their web space to publish the vCard schema (at http://www.w3.org/2001/vcard-rdf/3.0).

We have seen that it is nevertheless possible to define in our own file on our own computer (or our own web server) an alternative schema or ontology for the same classes and properties. This is a bit naughty.

Using XML Base

In the first part of the DAML tutorial we saw that telling the writer to produce a file with base http://www.w3.org/2001/vcard-rdf/3.0 produced tidier output.

The schema we used in the second half of the DAML tutorial was initially produced programmatically by the sample solution for the first section. It used the tidier form of output. In order to tell the RDF/XML parser to pretend that it was at http://www.w3.org/2001/vcard-rdf/3.0 we use an xml:base declaration.

    xml:base="http://www.w3.org/2001/vcard-rdf/3.0"

Typically this is at the beginning of the file, but can occur on any XML element.

Restrictions, Enumerations, Unions

DAML allows you to describe anonymous classes that are described in terms of the properties used by the objects within them. This is quite like talking about a set of objects by describing something about them, e.g. "the people with red hair", rather than by just naming the class "red-haired people". The class DAMLRestriction corresponds to these anonymous classes.

DAML also allows you to explicitly construct a set by giving its members. These correspond to the prop_oneOf() method found on a DAMLClass. Similarly prop_unionOf() @@@@ allow accessing to classes defined in terms of these set theoretic operators.

None of these are fully implemented in Jena. The API allows you to create and explore ontologies which have these constructions, but you cannot navigate instance data using these constructions.

The underlying issue is that we have not yet invested in a DL reasoner, which is a moderately complex piece of specialised software needed to implement these requirements.

The Ontology Layer in Jena 2

This section discusses likely changes planned for future versions of Jena.

Difficult Improvements for Jena 1

The Jena team are aware that the user community wants deep improvements in the ontology support in Jena.

The goal is to allow both:

Inference Models
These models are ones which, for example, might include all the implicit information formed by joining an ontology with some base data. For example, a range constraint or a subclass rule might imply additional type information. In an inference model this additional type information would appear dynamically when relevant.
Cleaner enhancement classes
The DAML API adds functionality to the base Jena API. It should be possible for Jena developers to add their own additional functionality as well. It should be possible to use the additional DAML functionality over models other than memory models.

The current Jena releases (1.X.Y) have been found not to have sufficient architectural support for such changes.

Architectural Changes

We are designing a layered architecture in which both inference models and enhanced models can be layered on top of other models (their base model(s)). We believe that a layered architecture will provide clarity about how to access a model both with and without the additional functionality.

Rollout Plan

The first version of Jena 2 is likely to exercise the new architecture by providing RDF Schema support.

After that we wish to provide improved DAML support including a Description Logic reasoner and closure over implicit triples.

We hope to provide both of these within 2002.

In the event that the W3C WebOnt Working Group make rapid progress, we may bypass full DAML support in favour of supporting the new language.