Dave Reynolds, 5/12/01
The jena/rdb module provides an implementation of the jena model
interface which stores the RDF statement information in a relational database.
The implementation can support a variety of database table layouts and can customize
the SQL code to cope with the vagaries of different database implementations.
Getting started - creating and accessing database instances
Multiple models per database
Constraints
Database layouts
Notes
Database-backed RDF models are instances of the class jena.rdb.ModelRDB
.
As well as implementing the full jena.model.Model
interface the
static methods on ModelRDB
provide means to create, extend and
reopen database instances.
First consider the situation where we have an available database but as yet it has no RDF models stored in it and we want to format it for holding RDF statements. In that case we would use:
DBConnection dbcon = new DBConnection(DATABASE_URI, user, password); ModelRDB model = ModelRDB.create(dbcon, LAYOUT_STYLE, DATABASE_TYPE);The
DBConnection
class provides different methods for specifying
the underlying database. In particular it can be specified, as in the example above,
as a jdbc uri (e.g. jdbc:interbase:\\localhost:\databases\test.gdb
) along
with any required user name and password. Alternatively, the database connection can be
opened using the standard jdbc calls and the resulting jdbc Connection
object can
be wrapped up as a DBConnection
for passing on the ModelRDB.create
.
The ModelRDB.create
call takes two arguments in addition to the database connection
itself. Firstly, the LAYOUT_STYLE
is a string defining the type of database
table structure to be used. Typical values for this include:
Generic | General layout, all statements are stored in a single table. Resources and literals are indexed using integer id's generated by database sequence generators. |
GenericProc
| Variant on the generic layout that uses stored procedures for all model updates, this can have a 30-50% performance advantage in some cases. |
MMGeneric | Similar layout to "Generic" but can support more than one jena model in a single database. |
Hash
| Similar layout to "Generic" but uses MD5 hashes to generate the id's for resources and literals - this avoids relying on the database generators and is more portable for a small performance hit. |
MMHash | Similar layout to "Hash" but can support more than one jena model in a single database. |
The second argument DATABASE_TYPE
is a string defining the type
of the database. Whilst, jdbc offers good database independence most SQL code
remains database-dependent - for example sequence generators, stored procedures
and limitations on table indexes all vary across databases. The jena RDB modules
cope with this by allowing implementors to customize the SQL code to suit the
database server to be used. If using a portable layout such as "Generic"
or "Hash"
then the DATABASE_TYPE
of "Generic"
may work otherwise use a specific database name here. The distribution includes
configuration files for "Interbase", "Mysql", "Postgresql" and "Oracle".
Others can be created. The matrix of currently supported layouts is:
Database | Layouts |
Postgresql | Generic, MMGeneric, Hash, MMHash |
Mysql | Generic, MMGeneric, Hash, MMHash |
Interbase | Generic, MMGeneric, Hash, MMHash [Implementations of GenericProc, MMGenericProc are also provided but not supported and are subject to code-rot.] |
Oracle | MMGeneric |
The call to ModelRDB.create
will create the appropriate database
tables and record within the database a note of the layout chosen. This means
that a previously created database can be reopened using:
DBConnection dbcon = new DBConnection(DATABASE_URI, user, password); ModelRDB model = ModelRDB.open(dbcon);
Note that no layout or database information is needed this time - it is retrieved from the pre-formatted database.
Some database formats only support one jena model per database. Other layouts
can support multiple models with a single database - these have slightly lower
performance but can be more convenient. Thus if dbConnection
is
a connection to an already formatted database whose layout supports multiple
models then the call:
ModelRDB model = model.createModel(dbConnection, modelName);will create an additional model within the same database. The
modelName
can be used
to reopen the same model in the future using:
ModelRDB model = model.open(dbConnection, modelName);and
Iterator it = ModelRDB.listModels(dbConnection);will list the name of all the models stored in the database.
The ModelRDB
interface supports all the standard jena facilities
for navigating the model. This allows us to, for example, find all statements
with a given pattern of subject, property and object values. If we wish to perform
partial matching on object literal values (e.g. finding all statements whose
literal object value starts with "foo" or is an integer in the range [2,8),
say) then we have to use the Selector
mechanism. Unfortunately
in this case all candidate statements with matching subject and property values
will be retrieved and then filtered by the supplied Selector.test()
code.
The RDB package allows us to use the underlying database implementation by providing an alternative mechanism for listing statements - that of constraints. For example,
IConstraints constraints = modelrdb.createConstraints(); constraints.addSubjectConstraint(foo) .addPropertyConstraint(prop); Iterator statements = modelrdb.listStatements(constraints);will return an iterator over all statements in the model with subject
foo
and property prop
. More interestingly the code:
IConstraints constraints = modelrdb.createConstraints(); constraints.addSubjectConstraint(foo) .addPropertyConstraint(prop) .addStringConstraint("NOT LIKE", "%bar%"); Iterator statements = modelrdb.listStatements(constraints);will list just that subset of the above statements whose object value is a literal string which does not contain the substring "bar". The first argument of the
addStringConstraint
call can be any standard SQL string match operation. Note that this is a potential
source of porting problems across different databases - most databases support
"LIKE
" but some don't use the ANSI SQL pattern characters
(e.g. '%') and some have other operators ("CONTAINS
", "STARTSWITH
",
"REGEXP
").
As well as string matching there is some experimental support for integer-valued literals.
When and if jena is extended to support true typed literals a fuller match constraint
mechanism might be possible. In the meantime, to support the common case of integer literals we
note any literal in the database which could be interpreted as an integer. In this way we
can support code such as:
IConstraints constraints = modelrdb.createConstraints(); constraints.addSubjectConstraint(foo) .addIntConstraint("<=", 42) .addIntConstraint(">", 4); Iterator statements = modelrdb.listStatements(constraints);Note that in all these cases the
constraints
object can be reused which
may avoid the overhead of generating and parsing the required SQL code (depending on the
nature of the jdbc driver in use).
One of the aims of the RDB package was to support experimentation with different database layouts. Some of this experimentation was done during the package development (see performance notes) but the main supported layouts included in this release are small variants on the standard triple table schemas. Viz:
RDF_STATEMENTS | ||
Column name
|
Type
|
Comments
|
subject | id-ref | |
predicate | id-ref | |
object | id-ref | |
object_isliteral | smallint | flags whether "object" is in literal or resource table |
model | id-ref | only used in multiple-model variants |
isreified | smallint | not used at present |
RDF_LITERALS | ||
Column name
|
Type
|
Comments
|
id | id-ref | |
language | varchar | xml:lang value if available |
literal_idx | varchar | the literal itself or the largest subset of that which is indexable by the database |
literal | blob | the full literal value if the literal won't fit in literal_idx |
int_ok | smallit | flag to indicate that an parse of the literal into an integer is available |
int_literal | int | the integer value of the literal, only valid if int_OK=1 |
well_formed | smallint | preserve jena flag that the literal is well-formed xml |
RDF_RESOURCES | ||
Column name
|
Type
|
Comments
|
id | id-ref | |
namespace | id-ref | pointer to namespace table |
localname | varchar |
RDF_NAMESPACES | ||
Column name
|
Type
|
Comments
|
id | id-ref | |
uri | varchar |
RDF_MODELS | ||
Column name
|
Type
|
Comments
|
id | id-ref | |
name | varchar | Used when reopening a persistent model in a database that supports more than one model. |
RDF_LAYOUT_INFO - name/value pairs which define the layout properties | ||
Column name
|
Type
|
Comments
|
name | varchar | |
val | varchar |
The id-ref
type used above is typically either mapped to an int
or a char
string. For some schemes we allocated integer id's for
the statements, resources etc by using database sequence generators or using
auto-increment columns in which case all id-refs are int
s. An alternative
approach is to use a unique content hash, such as MD-5 or SHA-1, to generate
a globally unique ID which can be used across databases. Depending on the database
jdbc driver we can either store this hash-id in a CHAR(16)
or we
base64-encode it as a string into a CHAR(24)
value. For more details
see the porting notes.
The layouts currently defined are:
Layout
|
Supports multiple-models?
|
Uses hash ids?
|
Comments
|
Generic | no | no | See above for details |
MMGeneric | yes | no | |
GenericProc | no | no | Variant on generic that uses stored procedures for updates |
MMGenericProc | yes | no | Variant on generic that uses stored procedures for updates |
Hash | no | yes | |
MMHash | yes | yes |
The supplied configuration files for Interbase support all 6 variants, those for Mysql and Postgresql do not support the two "proc" variants but support all the others. In this case "support" means passes all jena regression tests.