Jeremy Carroll
April 2002
DAML+OIL is a means for describing the vocabulary used within some RDF. For example, in the vCards we have already seen we have used <vCard:FN> and <vCard:EMAIL>. But what are these supposed to mean? How should you use them? What is the relationship between them? The job of the DAML layer is to provide in-depth formal answers to such questions.
We have seen that this vocabulary was introduced using two descriptions in English (the vCard in RDF W3C note and the original vCard RFC). There is also a more formal description, given using an RDF Schema (local copy). The DAML+OIL description of the same vocabulary can be thought of as extending this schema, or as providing an alternative, deeper schema.
Within Jena, we use "DAML" where we should, more properly say "DAML+OIL". DAML+OIL is the language component of DAML, which as a project had a number of other important deliverables. More information about DAML and DAML+OIL can be found at:
None of these are prerequisites for this tutorial. Students of this tutorial will be most interested in the walk-thru; but even that is quite challenging for people unfamiliar with the underlying concepts. Some students may find it easier to read this tutorial before stuggling with the more advanced description logic concepts needed for the walk-thru. Some students may prefer studying both the DAML+OIL walk-thru and this tutorial at the same time, perhaps scanning both first and then stepping through them in more detail, switching from one to the other every so often. The two are very different and have different objectives. In particularly the DAML+OIL walk-thru covers more of the language than this tutorial; whereas this tutorial concentrates on the use of DAML within Java, rather than the XML seralization.
After this section you will be able to:
This section includes a sequence of exercises that should be done in order. The solution for each exercise is a Java source file, and the easiest approach is to add a few lines to the answer of the previous exercise. If an exercise is either too easy or too hard for you, refer to my solution before moving to the next subsection.
It is important to grasp the Accessor design pattern used extensively within the DAML API. This is highlighted every so often in this section of the tutorial.
The DAML API is found in the package com.hp.hpl.jena.daml. The first class to note is DAMLModel. All parts of an ontology get created within one of these.
So the first code we write is:
DAMLModel model = new DAMLModelImpl();
Exercise Create a new main class that includes and executes the above line of code. (crib (java) (text) )
This model
is a Jena model, with some additional
functionality to help you create DAML ontologies.
Each DAML ontology is encoded as RDF triples that can be stored within
a Jena model. We will create an ontology using the DAML API.
All of the operations of the DAML API get translated into
adding, deleting and navigating the triples of this model.
Browse through the Java doc for DAMLModel; find an appropriate method for creating a new ontology.
Reading the documentation for this method we see that we may choose a name (a URI) for our ontology. For now, we won't choose a name.
Exercise Modify your code to create an ontology, by using the method you have found. (crib (java) (text))
Having created an ontology the next step is to write it out.
Remember that the DAMLModel is a Jena Model, and the ontology that you have created is stored within the DAMLModel. If necessary refer back to the first section of the tutorial to remember how to write the model out as XML.
Exercise Modify your code to write out the ontology as XML. (crib (java) (text))
The XML produced should look something like:
<rdf:RDF xmlns:rdf='http://www.w3.org/1999/02/22-rdf-syntax-ns#' > <rdf:Description rdf:about='#A0'> <rdf:type rdf:resource='http://www.daml.org/2001/03/daml+oil#Ontology'/> </rdf:Description> </rdf:RDF>
That's a bit of a mess, maybe we should try the Jena pretty writer.
Exercise
Modify your code to use the
pretty writer (use
"RDF/XML-ABBREV"
).
(crib (java)
(text))
That's better, I am now getting:
<?xml version='1.0'?> <rdf:RDF xmlns:daml='http://www.daml.org/2001/03/daml+oil#' xmlns:rdf='http://www.w3.org/1999/02/22-rdf-syntax-ns#'> <daml:Ontology/> </rdf:RDF>
We have said that the ontology is represented within the RDF model as triples,
but what triples are there? The N-triple format is the easiest for
understanding precisely which triples have been used.
Jena allows us to use this format using the
"N-TRIPLE"
writer, briefly
mentioned in the first section.
Exercise Modify your code to also write out the ontology as N-triple. (crib (java) (text))
There is one triple so far, and it is:
_:A <rdf:type> <daml:Ontology> .
(Note: the official N-triple format, output by Jena, does not use namespace prefixes, but writes very long lines).
This subsection introduces the Accessor design pattern. It is important to understand this pattern, and it is worth spending sufficient time on this simple example of its use.
Looking at the DAML walkthru
we
see that an ontology has a couple of properties: daml:version
and rdfs:comment
.
The corresponding methods in Jena are found in
DAMLOntology
and DAMLCommon.
DAMLCommon is used as a place to put methods that are useful in more than one of
the DAML interface definitions. Many things can have rdfs:comment
s
so the prop_comment()
method is common.
daml:version
only applies to DAML ontologies, and so it is not common.
Both these methods return a LiteralAccessor
.
This reflects one of the design patterns used in the DAML API.
Many of the properties in a DAML Ontology can have zero, one or more values.
Natural operations including adding, removing, setting, getting and listing one or more values.
Rather than duplicate this list of operations for each property within the API, the
user of the API is required to chain two calls together e.g.
onto.prop_comment().addValue("This is another comment");
Exercise Modify your code to include a version number and two comments on the ontology. (crib (java) (text))
Exercise Generate output RDF/XML and N-Triple for your modified ontology. Use both the RDF/XML and the RDF/XML-ABBREV writers. (Repeat this exercise after any of the following exercises). (crib (java) (text))
Most of the vCard specification is a list of properties.
e.g. <vCard:FN>
the full name property.
We look at the JavaDoc for DAMLModel to find how to create a property. Unfortunately we find three different methods, all of which look promising:
The difference between these is as follows:
createDAMLObjectProperty
createDAMLDatatypeProperty
createDAMLProperty
We see that the <vCard:FN>
property takes a value that is always a string.
Within DAML+OIL XML Schema Datatypes are used for describing datatypes.
Within the Jena DAML subsystem the supported datatypes are:
xsd:string
represented by the URI
"http://www.w3.org/2000/10/XMLSchema#string"
xsd:integer
represented by the URI
"http://www.w3.org/2000/10/XMLSchema#integer"
xsd:decimal
represented by the URI
"http://www.w3.org/2000/10/XMLSchema#decimal"
xsd:real
represented by the URI
"http://www.w3.org/2000/10/XMLSchema#real"
Advanced: see
XMLDatatypeRegistry
for more information.
In particular, it is possible to add more datatypes.
Exercise Add the property<vCard:FN>
to your ontology by modifying your code. Do not yet include the information that the full name is a string. Remember the full URI for thevCard
prefix is"http://www.w3.org/2001/vcard-rdf/3.0#"
. (crib (java) (text))
In mathematical language a relationship or property relates two sets: the first is called its domain; and the second is called its range.
<vCard:FN>
relates vCards (the domain) to strings (the range).
We don't yet have a URI for describing vCards, so we will just concentrate on the range.
Within the DAML API datatypes are represented by a
DAMLDatatype
.
These too are created using the relevant method on
DAMLModel
.
To set the range of a property or datatype property we use the method
prop_range()
. This returns a PropertyAccessor
which uses the same design pattern as the LiteralAccessor
that we
used for accessing the daml:version
and rdfs:comment
properties. Be sure to understand this design pattern before moving on.
Thus to set the range of <vCard:FN>
we need
to create the DAMLDatatype
corresponding to
xsd:string
and then add
that to the range
of the property.
Exercise
Add the relevant range constraint to the property <vCard:FN>
by modifying your code.
(crib (java)
(text))
Optional Exercise If you want the practice, add the following fields:with the appropriate range constraints. (Note: this is diverging somewhat from the vCard specification, particularly section 3.2). This is a good time to refactor your code; it might be getting a little scrappy. (crib (java) (text))
NICKNAME
MAILER
GEO
TITLE
ROLE
CATEGORIES
NOTE
PRODID
SORT-STRING
CLASS
Exercise Modify that part of your code that uses the RDF/XML-ABBREV writer to use the base URL of"http://www.w3.org/2001/vcard-rdf/3.0"
. Is the output prettier? (crib (java) (text))
Hint Use thebase
argument of thewrite(writer,"RDF/XML-ABBREV",base)
method.
The exercise above produces an RDF/XML file that is intended to be
found at the web at the URL "http://www.w3.org/2001/vcard-rdf/3.0"
.
Later we will learn about using XML Base which allows us to overcome this limitation.
Some of the properties in the vCard
specification
have 'type parameters', see section 3.3 "Properties with Attributes".
In particular, consider
<vCard:TEL>
,
<vCard:EMAIL>
,
<vCard:ADR>
and
<vCard:LABEL>
.
(The inclusion of <vCard:TZ>
in this part of the vCard specification appears to be a mistake).
Each of these is modelled in the schema using an
RDF property e.g. (vcard:TEL
) a list of RDF classes e.g.
(vcard:home
, vcard:work
, etc), and a superclass for those
classes, e.g. vcard:TELTYPES
.
To create a class from the DAML API we look at the
DAMLModel
javadoc and
find the appropriate method.
To create a class and a subclasses, we first create two classes,
and then use the
prop_subClassOf()
method on the subclass, and add
the superclass. Once again we are using the accessor design pattern.
Exercise Add code creating theTELTYPES
,home
,work
,pref
classes, with appropriate subclass relationship between them. Add code for theTEL
object property, and constrain its range to be aTELTYPES
(or one of its subclasses). (crib (java) (text))
The vCard properties such as <vCard:FN>
are only meant to apply to vCards.
Within DAML (and RDFS) the natural way to say this is to use a domain
constraint. The domain constraint needs a class of vCards.
However the vCard specification doesn't introduce such a class.
We will ignore the spec., and create such a class anyway.
We can call it <vCard:VCARD>
i.e. with URI
"http://www.w3.org/2001/vcard-rdf/3.0#VCARD"
.
Then we can specify the appropriate domain constraint on all our properties.
We do this using the prop_domain()
method.
Exercise
Add code creating the <vCard:VCARD>
and add
domain constraints to the properties you are already creating.
(crib (java)
(text))
Advanced Exercise Add more code so that your VCard ontology covers all the properties and classes in the VCard RDF Schema. (crib (java) (text))
This part of the tutorial reverses the relationship between the RDF/XML file and the Java code. Instead of creating the ontology in Java and writing it out as XML, I have created an ontology in XML and we will use this by reading it into Java.
Most users of DAML+OIL seem to be unhappily comfortable with the XML syntax used in the DAML+OIL specifications. The normal way of defining an ontology is to find a file on the web (particularly at the DAML ontology library) with an appropriate ontology already in it.
We use the version of the vCard ontology that I produced. It is here. Like in the last section, we create a DAMLModel and use it to load in the ontology.
The usual read
methods are used to load an ontology,
but a DAMLModel implements this in a special way to give the additional functionality.
Exercise Starting with a new Java file, with a main method, create a DAMLModel and read in the vCard ontology. (crib (java) (text))
Having loaded the ontology into Jena we can now explore it within our Java code.
We will start by listing the classes in the ontology.
listDAMLClasses()
looks like a promising place to start!
The iterator returned always gives a DAMLClass
.
Exercise Change your code to print out the names of the classes in the vCard ontology. (crib (java) (text))
Notice:
<Anonymous 1 ...
RDFAnon1
DAMLRestrictions
Within RDF resources may be named or may be unnamed.
Some of the classes in a DAML ontology are normally unnamed.
This seems a bit strange, because in most class oriented languages class names are
obligatory and tell
us a lot about what the class is meant to be.
In DAML, unnamed classes are ones which are defined using some operation within
description logic. Within the ontology for vcards the class RDFAnon1
is defined as an intersection of other classes. Since intersection is conceptually
straight forward, the lack of a meaningful name on this class is permitted (and might in
some cases be helpful, the name can just get in the way). In the
code that I used to generate this ontology
this class really was unnamed, but when it got written out as RDF/XML it was
forced to have a name. Sometimes RDF/XML allows resources to be unnamed, sometimes it
needs them to have a name (the name the software chose was "RDFAnon1"
).
This is a significant
known bug
with the XML version of RDF. (We are considering having a Jena-specific solution of
using not-quite-URIs of the form _:foobar
for unnamed resources within
RDF/XML).
The classes that are DAMLRestrictions
corresponds to RDF/XML like:
<daml:Restriction> <daml:onProperty rdf:resource='#TYPE' rdf:type='http://www.daml.org/2001/03/daml+oil#Property'/> <daml:toClass rdf:resource='http://www.w3.org/2000/10/XMLSchema#string'/> </daml:Restriction>
This too, is an operation from description logic. It means
the set of things who are the subject of a triple with predicate
vcard:TYPE
, whose object is an xsd:string
.
Within the ontology this restriction class is used (within an intersection which
is used) as the range constraint of some other property (e.g. LOGO
).
What that means is that the value of the property LOGO
always has
a string-valued TYPE
property.
This is a particularly baroque and awkward way to say something quite straightforward. Readers may be pleased to hear that early indications are that the Web Ontology Language that will replace DAML+OIL is likely to provide a more intuitive frame oriented way of saying this sort of thing.
Let's try
listDAMLProperties()
.
Exercise Change your code to print out the names of the properties in the vCard ontology. (crib (java) (text))
The last exercise should have produced a fairly muddled output.
Properties of vCard
s will have been muddled with properties of
ADRPROPERTIES
and properties of NPROPERTIES
.
So the property Family
which only applies to NPROPERTIES
is not clearly separated from FN
(which also means Family Name
)
which applies to vCard
s.
DAML takes what is known as a description logic viewpoint that is property centric, rather than the more intuitive frame viewpoint that is class centric.
We can combine the previous two exercises, and use
the method
getDefinedProperties()
to arrange the output in a frame like fashion.
Exercise Change your code to print out the properties listed by the class they apply to. (crib (java) (text))
Notice that this has not worked for the DAMLRestriction
s.
We saw earlier that RDFAnon1
is described making it clear
that vCard:TYPE
applies to it. But in the listing produced
by the last exercise no properties are associated with any restriction.
This reflects limitations of the Jena support for DAML, in that we do not (yet) have a full description logic engine. Thus within Jena, it is hard to answer some interesting questions about an ontology (such as which properties must be present on this resource).
Properties in DAML can also have cardinality constraints
.
These indicate the maximum number of object values a property might have for
some specific subject; or conversely the maximum number of subjects of a
property correspond to one specific object.
Usually that maximum number is 1. This gives rise to unique properties which have at most one value on each subject; and unambiguous properties for which an object value corresponds to at most one subject.
The DAML API supports both unique and unambiguous properties, through the same
design pattern of a set method
(
setIsUnique(boolean)
,
setIsUnambiguous(boolean)
) and a get method
(
isUnique()
,
isUnambiguous()
).
Exercise Change your code to only print out unique and/or unambiguous properties listed by the class they apply to. (crib (java) (text))
So far, we have only talked about the ontology for vCards, we haven't actually seen a vCard in DAML. In this section, we use both the vCard ontology, and some sample vCards that conform to that ontology.
In most data and information frameworks the base data, and the information
describing the base data are kept quite separate. A database schema is conceptually distinct
from the database data.
In the description logic world from which DAML+OIL emerged this separation was described using
the word T-Box
, the terminology box that lists the terms you can use and their
interactions, and the word A-Box
, the assertion box, the base facts
described using the terminology from the T-Box.
In RDF, RDFS, and DAML+OIL, this clean separation between the base data and the schema or ontology is lost. From one point of view it is all muddled up together in a single soup. The Jena DAML API reflects that soup. Both the ontology terms and the instance data are loaded into the DAMLModel, and accessed primarily through the DAML API, (although both can also be accessed through the Jena Model API as well).
Optional Exercise Create suitable instance data files for the rest of the exercises. Here is some code (java) (text) that can randomly generate such data.
We load RDF/XML instance data files, just like we loaded the ontology file above.
Exercise Starting with a new Java file, with a main method, create a DAMLModel and read in the vCard ontology, and the instance data. (crib (java) (text))
Having loaded all the data into our DAMLModel, we can now access it
programmatically. Look at the DAMLModel javadoc and choose the appropriate method
to get to the instances. When we find an instance, it should have Java class
DAMLInstance
. The iterator returned by
DAMLModel.listDAMLInstances()
also returns objects of Java class DAMLDataInstance
. These are
not interesting for this exercise and must be ignored.
To display the instance we will need to access a property, e.g.
vCard:FN
.
To do that we use the method
accessProperty()
. This returns a
PropertyAccessor
which uses the now familiar accessor design pattern.
Since the ontology has told us that vCard:FN
is a unique property, this means that
there is at most one value. Thus we can use the method
getDAMLValue()
rather than the more general
getAll(true)
. (The boolean argument to getAll concerns transitive properties that
are outside the scope of this tutorial).
Using the explicitly typed sample data, the
values returned by getDAMLValue()
are
DAMLDataInstance
s.
To get the actual value to display, a further call to
getValue()
is necessary.
Exercise Modify your code to print out the name from each vCard in the knowledge base. (crib (java) (text))
It would be easier if we only looked at objects of the
right class <vCard:VCARD>
.
This is possible; find an appropriate method in
DAMLClass
.
Exercise
Modify your code to access the VCards through the class <vCard:VCARD>
.
(crib (java)
(text))
A hint for the following exercises is that a quick way to get the property rdf:value
is
com.hp.hpl.mesa.rdf.jena.vocabulary.RDF.value
If we want to find a particular vCard, one approach is to use a brute force search
through the knowledge base.
Suppose we know someone's e-mail address and we want to find out more about them.
We can go through the cards one by one, and access the vCard:EMAIL
property (using getAll(true)
since EMAIL is not a unique property).
Each returned value will be a complex object (another DAMLInstance) and we need to access the
rdf:value
property to reach the actual e-mail address.
The property rdf:value
is not declared in our ontology,
and is hence not treated as a DAML property.
Using a DAML property accessor on rdf:value
in this example
does not work. Instead we must either make Jena calls
or use
getPropertyValue()
.
If any of the given e-mail addresses matches then we have found the one.
Exercise
Modify your code to print out the name from the vCard for amanda_cartwright@example.org
.
(crib (java)
(text))
A hint for the following exercises is that a quick way to get the property rdf:type
is
com.hp.hpl.mesa.rdf.jena.vocabulary.RDF.type
Looking at DAMLCommon
we find a method
getRDFTypes
.
This can be used to find all the classes to which an object belongs.
Exercise Modify your code to select the vCard foramanda_cartwright@example.org
, and then find all the types of the complex object (the DAMLInstance) whoserdf:value
isamanda_cartwright@example.org
. (crib (java) (text))
Notice the daml:Thing
class in your answer, this is the top class in DAML.
Exercise Modify your code to select only the explicit types of the complex object. (crib (java) (text))
Notice that the DAML implementation does do a significant part of the reasoning
to do with class membership.
NB: the classes found are the result of following the isSubClassOf
property, and are not all those classes that would be found with a description
logic classifier.
With the closed
flag set to false, this operates like
the underlying Jena calls.
Exercise Modify your code to use the Jena Model API for accessing therdf:type
(e.g. uselistProperties(rdf:type)
). (crib (java) (text))
Notice that when we do not want any additional reasoning the Jena API calls are just as useful as the DAML API calls.
In general modifying a DAMLModel through calls on the underlying Jena Model will lead to problems. We are currently exploring alternatives layered architectures for Jena that will allow us to support such multiple views over the same underlying data.
This section consists of leads into further study. This tutorial does not provide an adequate basis for understanding either these advanced DAML functionalities, or their (partial) implemenation within Jena.
We saw in the last exercises that the DAML implementation can be smart about class membership. In general it can be smart about any property for which it knows the ontology. This functionality, as with class membership, is not a full description logic implementation, but does cover equivalence, transitivity, inverse properties, and the class and property hierarchies.
The javadoc can be found under the class
PropertyIterator
, and under many methods such as
getEquivalentValues
DAML provides a number of ways of declaring that two different names (URIs) actually refer to the same resource. It also provides a way of showing that two names refer to different resources. These are:
daml:equivalentTo
daml:sameClassAs
daml:samePropertyAs
daml:sameIndividualAs
daml:differentIndividualFrom
Jena provides some support for these within the DAML API.
However, at least some of the relevant processing is expensive, and
see
setUseEquivalence(boolean)
.
Moreover, there will be many implicit equivalences, that can be inferred
from the ontology using a DL processor, that Jena cannot infer. These are
hence ignored, whatever the setting of the useEquivalence
flag.
Within the semantic web it is assumed that the owner of
a web-server on foobar.example.org
has first
bite at defining the schema or ontology for
all URLs starting http://foobar.example.org
.
So we see that the vCard ontology was published by the W3C and
they allowed the author to use their web space to publish the
vCard schema (at http://www.w3.org/2001/vcard-rdf/3.0
).
We have seen that it is nevertheless possible to define in our own file on our own computer (or our own web server) an alternative schema or ontology for the same classes and properties. This is a bit naughty.
In the first part of the DAML tutorial
we saw that telling the writer to produce a file
with base http://www.w3.org/2001/vcard-rdf/3.0
produced tidier output.
The schema we used in the second half
of the DAML tutorial was initially produced programmatically by the
sample solution for the first section. It used the tidier form of output.
In order to tell the RDF/XML parser to pretend that it was at
http://www.w3.org/2001/vcard-rdf/3.0
we use an xml:base
declaration.
xml:base="http://www.w3.org/2001/vcard-rdf/3.0"
Typically this is at the beginning of the file, but can occur on any XML element.
DAML allows you to describe anonymous classes that are described in terms of the properties used by the objects within them. This is quite like talking about a set of objects by describing something about them, e.g. "the people with red hair", rather than by just naming the class "red-haired people". The class DAMLRestriction corresponds to these anonymous classes.
DAML also allows you to explicitly construct a set by giving its members. These correspond to the prop_oneOf() method found on a DAMLClass. Similarly prop_unionOf() @@@@ allow accessing to classes defined in terms of these set theoretic operators.
None of these are fully implemented in Jena. The API allows you to create and explore ontologies which have these constructions, but you cannot navigate instance data using these constructions.
The underlying issue is that we have not yet invested in a DL reasoner, which is a moderately complex piece of specialised software needed to implement these requirements.
This section discusses likely changes planned for future versions of Jena.
The Jena team are aware that the user community wants deep improvements in the ontology support in Jena.
The goal is to allow both:
The current Jena releases (1.X.Y) have been found not to have sufficient architectural support for such changes.
We are designing a layered architecture in which both
inference models
and
enhanced models
can be layered on top of
other models (their base model(s)).
We believe that a layered architecture will provide clarity about
how to access a model both with and without the additional functionality.
The first version of Jena 2 is likely to exercise the new architecture by providing RDF Schema support.
After that we wish to provide improved DAML support including a Description Logic reasoner and closure over implicit triples.
We hope to provide both of these within 2002.
In the event that the W3C WebOnt Working Group make rapid progress, we may bypass full DAML support in favour of supporting the new language.