1. Introduction

rdf2java is a small tool written in Java. It allows easy handling of RDF data. Instead of using an RDF api for creating and searching for RDF triples, i.e., (subject, predicate, object), you just work with Java objects representing RDF subjects / objects.

Typically, you first create an RDFS (RDF Schema) file, containing declarations for classes and properties. RDF data (e.g. in an RDF file) refers to this RDFS file, just like an XML file can refer to a DTD or to an XML Schema file. The RDF Schema specifies what kind of RDF subjects (i.e. instances of these classes) are allowed and how they are linked to each other (-> properties). So, you have RDFS specifications and RDF data (instances) matching this RDF Schema.

Using rdf2java (or, to be more precise, the rdfs2class utility of the rdf2java package), we first create Java classes corresponding to the RDFS classes. Then we can use rdf2java to "read in" RDF data, and convert them to Java objects, being instances of the respective Java classes.

The original tool was written by Michael Sintek. Information and download of the old tool can be found here. I developed his tool further to suit our increasing needs in the FRODO project.

1.1 Example:

Consider the following RDFS file (XML syntax) declaring a class Person having two properties: (1) name is a string-valued property (->Literal) and (2) hasParent demands a reference to another instance of Person. [I won't go into detail wrt. RDFS and its typical XML syntax; see for more information on this topic.]

<?xml version='1.0' encoding='ISO-8859-1'?>
     <!ENTITY rdf ''>
     <!ENTITY rdfs ''>
     <!ENTITY example1 'http://org.dfki/rdf2java/example1#'>


<rdfs:Class rdf:about="&example1;Person">
    <rdfs:subClassOf rdf:resource="&rdfs;Resource"/>

<rdf:Property rdf:about="&example1;hasParent">
    <rdfs:range rdf:resource="&example1;Person"/>
    <rdfs:domain rdf:resource="&example1;Person"/>

<rdf:Property rdf:about="&example1;name">
    <rdfs:domain rdf:resource="&example1;Person"/>
    <rdfs:range rdf:resource="&rdfs;Literal"/>


Just to give you an idea, here's the plain set of RDF triples (subject, predicate, object) for the RDF Schema above. You may need to use your scroll bar! I sorted the triples for your convenience; don't forget, they aren't, really:

http://org.dfki/rdf2java/example1#Person    ,             ,
http://org.dfki/rdf2java/example1#Person    , ,
http://org.dfki/rdf2java/example1#hasParent ,             ,
http://org.dfki/rdf2java/example1#hasParent ,     , http://org.dfki/rdf2java/example1#Person
http://org.dfki/rdf2java/example1#hasParent ,      , http://org.dfki/rdf2java/example1#Person
http://org.dfki/rdf2java/example1#name      ,             ,
http://org.dfki/rdf2java/example1#name      ,     , http://org.dfki/rdf2java/example1#Person
http://org.dfki/rdf2java/example1#name      ,      ,

Assume, there is an RDF file containing some instances of this RDFS class (Person):

<?xml version='1.0' encoding='ISO-8859-1'?>
     <!ENTITY rdf ''>
     <!ENTITY rdfs ''>
     <!ENTITY example1 'http://org.dfki/rdf2java/example1#'>


<example1:Person rdf:about="&example1;example1_00005"
     example1:name="Bart Simpson">
    <example1:hasParent rdf:resource="&example1;example1_00006"/>

<example1:Person rdf:about="&example1;example1_00006"
     example1:name="Homer Simpson"/>


Again, just to give you an idea of the plain RDF, here are the triples, again ordered for your convenience (you may need to use your scroll bar):

http://org.dfki/rdf2java/example1#example1_00005 , , http://org.dfki/rdf2java/example1#Person
http://org.dfki/rdf2java/example1#example1_00005 , http://org.dfki/rdf2java/example1#hasParent     , http://org.dfki/rdf2java/example1#example1_00006
http://org.dfki/rdf2java/example1#example1_00005 , http://org.dfki/rdf2java/example1#name          , Bart Simpson
http://org.dfki/rdf2java/example1#example1_00006 , , http://org.dfki/rdf2java/example1#Person
http://org.dfki/rdf2java/example1#example1_00006 , http://org.dfki/rdf2java/example1#name          , Homer Simpson

When you visualize these set of triples you receive the following picture (only the RDF data is shown, not the RDF Schema). Boxes represent resources (RDF subjects) whereas ellipses show literals (strings), which can only be RDF objects. The named and directed edges visualize the predicates.
graph representation of these RDF triples

This RDF data contains two Persons: Bart Simpson and Homer Simson, whereas Bart has a parent, namely Homer.

Now, assume further, there is a Java class named Person, having the following structure:

public class Person
    public void putName(String name) {...}
    public String getName() {...}

    public void putHasParent(Person parent) {...}
    public Person getHasParent() {...}

The classes may get more complicated later, and the implementation is still missing, but for the moment that should do to get the picture... Besides the structure (the public getters and putters) are more important than the concrete implementation.

Using RDF Import (rdf2java API), you can transform the RDF data above into two Java objects, i.e., instances of the class Person. These instances will be created using Java's reflection API, so, they are created during run-time. After creation, both will be initialized calling the public method putName, and for one of them the method putHasParent will be called, too. This will result in two Java objects, one for "Bart" and one for "Homer". Additionally, the Bart-object links to the Homer-object.

There is of course a bit more salt in the objects, e.g., Person does really extend from a class called THING, which in turn extends from a class called RDFResource, but this will be handled later. Just to give you a hint: RDFResource provides getters and putters for the object's URI (unified resource identifier). Hence, calling the method getURI on the Bart-object, you receive http://org.dfki/rdf2java/example1#example1_00005, whereas when you call getURI on the Homer-object, you get http://org.dfki/rdf2java/example1#example1_00006. See the RDF file above to make yourself clear why.

What you don't see in the example is, that classes can be sub-classes of another class. As RDFS doesn't provide multiple inheritance, rdf2java doesn't support multiple inheritance either. This remark is for people using Protégé-2000 (see below) to generate and work on RDF/S.

1.2 What rdf2java can do

Opposed to plain RDF, rdf2java distinguishes between, you could say, "near" and "remote" objects. The reason is as follows: RDF data is just a set of triples (subject, predicate, object), which means, that links to objects are always represented by a triple (object, relation, other-object). As soon as this triple exists, the relation exists, no matter whether the linked resource exists or not, the reference to that resource always does.

On the Java-side, this is different. Keeping in mind the simple structure of the class Person above, the referenced Java object must be available, to establish the relation. For example, If the Homer-object is not available (there is no no Java-pointer available), you can not reference the Homer-object, and hence, you can not establish a link (relation) between the two, although there might be such a Homer-object "living" on some other computer or in some other Java-VM (Java virtual machine).

Therefore, rdf2java distinguishes between the two cases: (1) An object is either available in the current Java-VM, then it is an instance of THING; (2) The object is not available in the current Java-VM, then it is represented by an RDFResource, providing at least the URI of the referenced object. A THING is an RDFResource, which means, that also THINGS know about their URIs.

1.3 What rdf2java can not do

We are using Java objects as representatives for RDF subjects / objects. As such a Java object is an instance of some Java class, Java binds the object's Java pointer forever to this object, and furthermore to its class. Hence, the class of a Java object can never change, which means, that the represented RDF subject can never change it's belonging to a class. But this is different from the plain RDF world, where this is possible. However, when you know, that an RDF subject will never change it's class, you won't have any problems using rdf2java.

1.4 rdf2java works well with RDFS produced by Protégé-2000

RDFS unfortunately lacks some property constraints, such as, specifying whether a property can only contain a single value (only one triple with that predicate allowed), or whether there can be multiple values for that property.

We are using Protégé-2000 to create our Models / Meta-Models and file out the result as RDF/S. Protégé-2000 provides some interesing class and property declarations, that we don't want to miss when using rdf2java later on. Therefore rdf2java interpretes some of the Protégé-2000 specific declarations.

RDF files exported via rdf2java's RDF Export funtionality (Java objects --> RDF data) can, of course, be read in by Protégé-2000.

Note: rdf2java does not support multiple inheritance (because Java doesn't either).

If you like, look at some files in the example directory. However, the RDF/S files found there are slightly different than the one's I pasted above. That's because they have been generated using Protégé-2000 and, hence, they contain additional, Protégé-2000 specific modeling. If you have installed Protégé-2000, you can as well open the protege file (example1.pprj) there containing the modeling of the Person class, as well as, the modeling of the two instances for Bart and Homer.

1.5 Online documentation (javadoc) of the rdf2java API

. . . can be found here.

2. Installation

rdf2java is a pure java tool, written using Java 2, JDK 1.3.1_02. Hence, I propose to NOT use an older JDK. However, the sources are included, whereas you could try to recompile the tool with whatever JDK you like. If you are using JBuilder 6 (or later), you will even find a project file "rdf2java.jpx", so you can open it directly into the JBuilder IDE and compile it, debug it, whatever...

Just download and extract it to some directory, e.g., C:\java\ (an rdf2java directory will be created there automatically).<br/> However: This ZIP can be quite old, contact the developers (see link at the top of the page) for a newer one. The reason for this is, that as are using subversion for development, the ZIP is not created automatically. Sorry for that inconvenience!

Top-level directories:
doc : documentation (javadoc)
import : jar-files needed by rdf2java; these must be included in the CLASSPATH ("java -cp ..."), when using rdf2java.
lib : jar-file of the rdf2java tool; of course this must be included in the CLASSPATH, as well.
src : the complete source code
testdata : some test data for examples

All you need to use rdf2java is to get the CLASSPATH right, whenever starting a Java-VM. You can either set the CLASSPATH environment variable, or specify the classpath temporarily via the "-cp" parameter whenever you call the java interpreter.

lib/rdf2javaApidoc.jar contains the API documentation (javadoc) in a jar-file. It is much more compact that way and could such be stored better in our CVS repository.

All tools and elements included in rdf2java are in the package dfki.rdf.util.

3. Create the Java files from a RDFS file

If you want to use RDF Import / Export functionality for some RDF data, you need a set of Java classes which correspond exactly to the RDF data, which means, that there must be a Java class for each RDF class instantiated in the RDF data. Furthermore, all relations between RDF subjects / objects must correspond to getter and putter methods in the Java classes. For example, if there is a triple (a, b, c) in the RDF data (meaning a --b--> c), the Java object for a must provide the methods getB and putB.

3.1 RDFS2Class (dfki.rdf.util.RDFS2Class)

If there's an RDFS file available specifying the allowed classes and properties (i.e. relations), you can let rdf2java create the needed corresponding Java classes for you. The tool RDFS2Class, which is part of rdf2java, will generate the source code for theses classes out of a given RDFS file.

Look up the batch-file testdata/assign/rdfs2class.bat for an example of how to call the RDFS2Class tool.

Generally, you will have to call RDFS2Class as follows:

java -cp <...CLASSPATH...> dfki.rdf.util.RDFS2Class <...FLAGS...>
                                                    <...N1...>   <...P1...>
                                                    <...N2...>   <...P2...>
                                                    <...N3...>   <...P3...>
                                                             . . .


FLAGS:    -q: quiet operation, no output
          -s: include toString()-stuff in generated java-files
          -S: include recursive toString()-stuff in generated java-files
              (used instead of -s)
          -o: retain ordering of triples (usage of rdf:Seq in rdf-file)
              by using arrays instead of sets
          -I: insert stuff for incremental file-generation
              (needed for potential later usage of -i)
              ATTENTION: this option completely re-creates java-files and
              erases every user-defined methods and slots, 
              maybe you'd better use "-i" ?!
          -i: incremental generation of java-files, i.e. user added slots
              are kept in the re-generated java-files
              (this option includes already -I)
          Order of the following options is important!
          rdfs=<namespace>    : set different RDFS namespace
          rdf=<namespace>     : set different RDF namespace
          protege=<namespace> : set different Protege namespace

RDFS-FILE      : the RDFS-file declaring the classes and properties

OUTPUT-SRC-DIR : generated source files (not class files) go to this directory

(N1, P1), ...  : pairs of namespaces and package-names; this specifies how to
                 map namespaces to package-names

We typically use "-is" as FLAGS, which includes a toString method and allows for editing the Java classes without problems.

As RDF/S normally uses namespaces to ensure uniqueness of RDF resources all over the world, rdf2java maps the needed namespaces to package-names. This means, that RDFS2Class will create the corresponding directory structure, as well.

For our small example above, let's assume, you've got an appropriate RDF Schema file in C:\TEMP\example1.rdfs. Then you could call RDFS2Class with the following parameters:

java -cp <...CLASSPATH...> dfki.rdf.util.RDFS2Class

Please note, the last two(!) parameters. We are used to mapping the namespaces to packages with nearly identical names (slashes become dots), but you are not oblighed to do that. You could even map different namespaces to just one package. However, then you could not export (RDF Export) Java objects to RDF data.

For more information, please look at some files in the example directory. However, the RDF/S files found there are slightly different than the one's I pasted above. That's because they have been generated using Protégé-2000 and, hence, they contain additional, Protégé-2000 specific modeling. Also the generated Java class org.dfki.rdf2java.example1.Person is a bit more complex than the structure snipped above, but after having a look at it, you should roughly get the picture. Besides, although you can, you won't look into these classes very often. They are structures for keeping and representing some RDF data. In most cases, you will do nothing more than just call the getters and putters of these objects.

4. RDF Import / Export (RDF data <--> Java instances)

Look at to get an understanding of how the import / export works. For a better understanding I pasted an extract of the most important parts of the code below (marginal stuff like, e.g., exception handling has been removed):

import dfki.rdf.util.RDFImport;
import dfki.rdf.util.RDFExport;

import org.dfki.rdf2java.example1.Person;

public class SimpleImportExport
    public static void main( String[] args )
        final String NAMESPACE_1 = "http://org.dfki/rdf2java/example1#";
        final String PACKAGE_1   = "org.dfki.rdf2java.example1";

        Map mapNamespace2Package = new HashMap();
        mapNamespace2Package.put( NAMESPACE_1, PACKAGE_1 );
        RDFImport rdfImport = new RDFImport( mapNamespace2Package );

        // 1. import from RDF file "example1.rdf"
        Map mapObjects = rdfImport.importObjects( "testdata/example/example1.rdf" );
        printout( mapObjects );

        // 2. make some changes (or additions)
        //    e.g. add an object for "Lisa Simpson"
        Person lisa = new Person();
        lisa.putName( "Lisa Simpson" );
        // get object for "Homer"; you must know Homer's URI: "http://...#example1_00006
        Person homer = (Person)mapObjects.get( "http://org.dfki/rdf2java/example1#example1_00006" );
        lisa.putHasParent( homer );
        // add lisa object to mapObjects (so she can be exported in step 3 below)
        mapObjects.put( lisa.getURI(), lisa );
        System.out.println( "\n\n----------\nadded lisa\n----------\n" );
        printout( mapObjects );

        // 3. export to RDF file "example1_lisa.rdf"
        Map mapPackage2Namespace = new HashMap();
        mapPackage2Namespace.put( PACKAGE_1, NAMESPACE_1 );
        RDFExport rdfExport = new RDFExport( NAMESPACE_FOR_NEW_INSTANCES, mapPackage2Namespace );
        rdfExport.exportObjects( mapObjects.values(), "testdata/example/example1_lisa.rdf" );

    public static void printout( Map mapObjects )
        for( Iterator it = mapObjects.keySet().iterator(); it.hasNext(); )
            String sURI = (String);
            Object obj = mapObjects.get( sURI );
            System.out.println( "\n########## " + sURI + " ##########\n" + obj );

5. Knowledge Base (dfki.rdf.util.KnowledgeBase)

The KnowledgeBase class stores and maintains Java objects (representing some RDF data). At first sight, it provides hardly more than a simple data storage class. You could as well do fine with some java.util.Collection. However, the stored Java objects shall represent RDF data, which unfortunately delivers some problems. RDF (Resource Description Framework) is designed to talk about things (resources) with the explicit intention to talk about resources far away, think of all the distributed documents (e.g. web pages) around the world. You can put statements in one document describing another document somewhere else on the planet.

As already mentioned before (see 1.2 What rdf2java can do), rdf2java distinguishes between the two cases: (1) An object is either available in the current Java-VM, then it is an instance of THING; (2) The object is not available in the current Java-VM, then it is represented by an RDFResource, providing at least the URI of the referenced object. A THING is an RDFResource, which means, that also THINGS know about their URIs.

When importing RDF data, you get a set of Java objects. Some of them will maintain links to others, some of them will link to resources not available by Java objects. In the latter case, such a Java object links to an RDFResource.

Now assume, you have imported some RDF data and, hence, you have some Java objects representing this RDF data. When you, later on, want to import another RDF, which has relationships to the RDF you have imported earlier, then you have to take care using all these Java objects. In plain RDF, being hardly more than a set of triples (subject, predicate, object), you just have to add more triples and you're done. In the Java world, however, you've got the distinction between THINGS and RDFResources. When you import new RDF and hence get new Java objects, you have make sure, that all RDFResources pointing to you new Java object are converted to direct Java pointers to the new Java object. But it's even worse: Assume having independently imported two RDF files, meaning you have got two different collections of Java objects. Big problems arise as soon as you try to merge these two. You may say, this problem can be avoided. But think about distributed scenarios, think about agents sending RDF data among themselfs (in our research project FRODO our agents do that). What happens very often is, that agents are sending updates of their data, being RDF data. When using Java objects representating this RDF, you come to the point, where you have to assert new RDF data to old RDF data.

The KnowledgeBase class solves this problem by providing methods for such assertions. As an assertion of an object works on an older version of the same object, these methods have been named assign instead of assert, but maybe we should change this...?

Now, let's come to the semantics of this assign "operator". Assume you have got some objects {ai} in the knowledge base and want to assign {bj}. Then, first of all, this is identical to an iterative assignment of all bj individually. The assignment {ai} <- bj works according to the following rules:

  • 1. case: there is an ak with ak.getURI() = bj.getURI():
    This more or less means: bj is already in the knowledge base (that's ak), but, the bj assigned can contain other values and/or links.
    Therefore ak stays in the knowledge base, but all it's properties are getting updated. Assume, ak has got properties p1(ak),...,pn(ak)with values v1(ak),...,vn(ak);analogously bj has values v1(bj),...,vn(bj),then proceed for all m=1,...,n in the following way:
    • case 1.a: pm is a single value slot: vm(ak) <- vm(bj), i.e., take the newer value (this may be even no value, which empties the property).
    • case 1.b: pm is a multiple value slot and the values are strings (literals): just take all the values from vm(bj), because that are the new ones.
    • case 1.c: pm is a multiple value slot and the values are links to resources (other objects): this is the most interesting case...
      First of all, all values in vm(ak) not contained in vm(bj) are removed. Then, for the rest of the values (i.e. values contained in both) check:
      • IF both, vm(ak) and vm(bj) are THINGS, THEN assign vm(ak) <- vm(bj); this leads to a recursion, so, the assignment is a quite inefficient procedure!
      • OTHERWISE just take out vm(ak) and take in vm(bj). Exactly this "take in" and "take out" has (by now) one great disadvantage: It may ruin the order of the multiple values. For normal RDF this causes no trouble, as there is no specification about the order of the triples, but as soon as one uses ordering stuff like rdf:seq, there's problems ahead! Although rdf2java supports ordering of multiple values, we aren't using this, because it's quite against the idea of RDF. If we need ordering, we model it explicitly through explicit structures.
    • 2. case: there exists no such k with ak.getURI() = bj.getURI():
      This is, of course, the easiest case: As bj is just a new object, we put it into the knowledge base and we're done.

After this assignment procedure, there's a second pass done, which updates values being only references (RDFResource), although a THING is now available. These RDFResource values are then converted to Java pointers pointing to the respective THING. This is done by method updateRDFResourceSlots, which is automatically called in the assign method (in the KnowledgeBase class).


We change the old example slightly, as the old example lacks interesting cases. Therefore, we add an a inverse property to hasParent, namely hasChild. Furthermore we have to add the inverse values for this slot, as this isn't done automatically. The result is example2.* in the example directory.

Assume, our knowledge base contains Java objects corresponding to RDF data declaring Bart and Homer Simpson and their relationships to each other (properties hasParent and hasChild). The RDF graph for this is shown on the left image after this paragraph (namespaces have been abreviated). Assume further we want to assign a new object for Lisa Simpson plus relevant relationships to this knowledge base. You see the graphical representation of the RDF data to assign on the right image below. We will see below, the assignment won't lead to the result you might expect. Later we will show the correct RDF data to assign...
the knowledge base so far
RDF data to assign to the knowledge base

When looking at these pictures, keep in mind, that they are nothing more than the graphical representation of RDF triples; boxes represent RDF resources, ellipses represent literals (strings). Via rdf2java, resources (boxes) for instances of RDFS classes are represented by corresponding Java objects in the knowledge base, and outgoing edges are stored in these Java objects and are available via getter methods.

Hence, before the assignment, the knowledge base (left pictures) contains two Java objects: one for Bart (example1:example1_00005) and one for Homer (example1:example1_00006). Furthermore, both objects store the respective value for the name property. Finally the Bart object stores a link to the Homer object for the hasParent property, and the Homer object stores the inverse link for the hasChild property, respectively. Note, that there's no object for example1:Person as this is a class and no instance.

Analogously, the RDF data to assign (right image above) will be represented (e.g. after RDF Import) by two objects, one for Lisa's resource plus outgoing edges (properties name and hasParent) and one for Homer's resource plus outgoing edges (property hasChild). Note, that this Homer object is neither the same object as the one in the knowledge base, nor has it identical values; That Homer object, e.g., doesn't have a value for the name property, whereas the Homer object in the knowledge base has.

When assigning the RDF data (right image above), both of the two objects (Lisa and Homer) have to be assigned. The assignment will take place in two steps (see above): (1) the main assignment procedure, (2) updating all the property values linking to an RDFResource, which after the assignment is available as a THING. The first step of assigning Lisa's object is easy, as it doesn't yet exist in the knowledge base. Therefore, the object will bluntly be added (including inthere all information about outgoing edges) to the knowledge base. Assigning the Homer object, or to be more precise: the RDF data stored in the Homer object, is a bit more complicated. First of all, there exists already a Homer object in the knowledge base. Hence, the object can not be simply added, but must be assigned to the already existing one. Therefore, all Homer properties have to be examined. name: left-side Homer stores a value, right-side Homer doesn't. Possible interpretation: right-side lacks (leaves out) this information? No! Our interpretation is: the right-side Homer is the newer one and contains all new data. Hence, the value for the name property will be deleted on the left-side Homer object. hasChild: both Homer objects have different (disjoint) values for that property, so we take the new ones, i.e., the ones at the right side. The result (knowledge base) is as follows:

The result is not quite the one, you might have expected when I started this example. There are several mistakes: The Bart object forgot about his name. Then it still "knows" about his parent, but the Homer object doesn't reflect this relationship. As RDF/S doesn't know anything about inverse slots, this is this correct RDF. Finally it's not quite what we wanted to receive. The problem arose right at the beginning, where we specified, what RDF data to assign to the knowledge base. Let's make another assignment:
the knowledge base so far
RDF data to assign to the knowledge base

Note, that example1:example1_00005 (on the right image) will only be represented by an RDFResource object, because the right-side Homer object just needs to links to a reference of this resource. There's no need to know about about or change this resource. The Homer object has to be changed because of the hasChild property. Hence, we need to represent that by a THING object and add / change the property values we like. However, the result of this assignment will be as follows:

Last modified 8 years ago Last modified on 12/10/09 10:10:28