rdf2java

Context Navigation

Version 8 (modified by schwarz, 20 years ago) (diff)
--

0. Development Repository

We use Subversion (SVN) for development.

repository URL: https://rdf2java.opendfki.de/repos/rdf2java

subclipse (eclipse SVN plugin): http://subclipse.tigris.org/

Contacting the developers is currently done via Sven.Schwarz@…, Malte.Kiesel@…%20 email].

1. Introduction

rdf2java is a small tool written in Java. It allows easy handling of RDF data. Instead of using an RDF api for creating and searching for RDF triples, i.e., (subject, predicate, object), you just work with Java objects representing RDF subjects / objects.

Typically, you first create an RDFS (RDF Schema) file, containing declarations for classes and properties. RDF data (e.g. in an RDF file) refers to this RDFS file, just like an XML file can refer to a DTD or to an XML Schema file. The RDF Schema specifies what kind of RDF subjects (i.e. instances of these classes) are allowed and how they are linked to each other (-> properties). So, you have RDFS specifications and RDF data (instances) matching this RDF Schema.

Using rdf2java (or, to be more precise, the rdfs2class utility of the rdf2java package), we first create Java classes corresponding to the RDFS classes. Then we can use rdf2java to "read in" RDF data, and convert them to Java objects, being instances of the respective Java classes.

The original tool was written by Michael Sintek. Information and download of the old tool can be found here. I developed his tool further to suit our increasing needs in the FRODO project.

1.1 Example:

Consider the following RDFS file (XML syntax) declaring a class Person having two properties: (1) name is a string-valued property (->Literal) and (2) hasParent demands a reference to another instance of Person. [I won't go into detail wrt. RDFS and its typical XML syntax; see http://www.w3.org/TR/rdf-schema/ for more information on this topic.]

<?xml version='1.0' encoding='ISO-8859-1'?>
<!DOCTYPE rdf:RDF [
     <!ENTITY rdf 'http://www.w3.org/1999/02/22-rdf-syntax-ns#'>
     <!ENTITY rdfs 'http://www.w3.org/TR/1999/PR-rdf-schema-19990303#'>
     <!ENTITY example1 'http://org.dfki/rdf2java/example1#'>
]>

<rdf:RDF 
     xmlns:rdf="&rdf;"
     xmlns:rdfs="&rdfs;"
     xmlns:example1="&example1;">

<rdfs:Class rdf:about="&example1;Person">
    <rdfs:subClassOf rdf:resource="&rdfs;Resource"/>
</rdfs:Class>

<rdf:Property rdf:about="&example1;hasParent">
    <rdfs:range rdf:resource="&example1;Person"/>
    <rdfs:domain rdf:resource="&example1;Person"/>
</rdf:Property>

<rdf:Property rdf:about="&example1;name">
    <rdfs:domain rdf:resource="&example1;Person"/>
    <rdfs:range rdf:resource="&rdfs;Literal"/>
</rdf:Property>

</rdf:RDF>

Just to give you an idea, here's the plain set of RDF triples (subject, predicate, object) for the RDF Schema above. You may need to use your scroll bar! I sorted the triples for your convenience; don't forget, they aren't, really:

http://org.dfki/rdf2java/example1#Person    , http://www.w3.org/1999/02/22-rdf-syntax-ns#type             , http://www.w3.org/TR/1999/PR-rdf-schema-19990303#Class
http://org.dfki/rdf2java/example1#Person    , http://www.w3.org/TR/1999/PR-rdf-schema-19990303#subClassOf , http://www.w3.org/TR/1999/PR-rdf-schema-19990303#Resource
http://org.dfki/rdf2java/example1#hasParent , http://www.w3.org/1999/02/22-rdf-syntax-ns#type             , http://www.w3.org/1999/02/22-rdf-syntax-ns#Property
http://org.dfki/rdf2java/example1#hasParent , http://www.w3.org/TR/1999/PR-rdf-schema-19990303#domain     , http://org.dfki/rdf2java/example1#Person
http://org.dfki/rdf2java/example1#hasParent , http://www.w3.org/TR/1999/PR-rdf-schema-19990303#range      , http://org.dfki/rdf2java/example1#Person
http://org.dfki/rdf2java/example1#name      , http://www.w3.org/1999/02/22-rdf-syntax-ns#type             , http://www.w3.org/1999/02/22-rdf-syntax-ns#Property
http://org.dfki/rdf2java/example1#name      , http://www.w3.org/TR/1999/PR-rdf-schema-19990303#domain     , http://org.dfki/rdf2java/example1#Person
http://org.dfki/rdf2java/example1#name      , http://www.w3.org/TR/1999/PR-rdf-schema-19990303#range      , http://www.w3.org/TR/1999/PR-rdf-schema-19990303#Literal

Assume, there is an RDF file containing some instances of this RDFS class (Person):

<?xml version='1.0' encoding='ISO-8859-1'?>
<!DOCTYPE rdf:RDF [
     <!ENTITY rdf 'http://www.w3.org/1999/02/22-rdf-syntax-ns#'>
     <!ENTITY rdfs 'http://www.w3.org/TR/1999/PR-rdf-schema-19990303#'>
     <!ENTITY example1 'http://org.dfki/rdf2java/example1#'>
]>

<rdf:RDF 
     xmlns:rdf="&rdf;"
     xmlns:rdfs="&rdfs;"
     xmlns:example1="&example1;">

<example1:Person rdf:about="&example1;example1_00005"
     example1:name="Bart Simpson">
    <example1:hasParent rdf:resource="&example1;example1_00006"/>
</example1:Person>

<example1:Person rdf:about="&example1;example1_00006"
     example1:name="Homer Simpson"/>

</rdf:RDF>

Again, just to give you an idea of the plain RDF, here are the triples, again ordered for your convenience (you may need to use your scroll bar):

http://org.dfki/rdf2java/example1#example1_00005 , http://www.w3.org/1999/02/22-rdf-syntax-ns#type , http://org.dfki/rdf2java/example1#Person
http://org.dfki/rdf2java/example1#example1_00005 , http://org.dfki/rdf2java/example1#hasParent     , http://org.dfki/rdf2java/example1#example1_00006
http://org.dfki/rdf2java/example1#example1_00005 , http://org.dfki/rdf2java/example1#name          , Bart Simpson
http://org.dfki/rdf2java/example1#example1_00006 , http://www.w3.org/1999/02/22-rdf-syntax-ns#type , http://org.dfki/rdf2java/example1#Person
http://org.dfki/rdf2java/example1#example1_00006 , http://org.dfki/rdf2java/example1#name          , Homer Simpson

When you visualize these set of triples you receive the following picture (only the RDF data is shown, not the RDF Schema). Boxes represent resources (RDF subjects) whereas ellipses show literals (strings), which can only be RDF objects. The named and directed edges visualize the predicates.

http://www.dfki.uni-kl.de/~schwarz/rdf2java/testdata/example/example1.a.75.gif
graph representation of these RDF triples

This RDF data contains two Persons: Bart Simpson and Homer Simson, whereas Bart has a parent, namely Homer.

Now, assume further, there is a Java class named Person, having the following structure:

public class Person
{
    public void putName(String name) {...}
    public String getName() {...}

    public void putHasParent(Person parent) {...}
    public Person getHasParent() {...}
}

The classes may get more complicated later, and the implementation is still missing, but for the moment that should do to get the picture... Besides the structure (the public getters and putters) are more important than the concrete implementation.

Using RDF Import (rdf2java API), you can transform the RDF data above into two Java objects, i.e., instances of the class Person. These instances will be created using Java's reflection API, so, they are created during run-time. After creation, both will be initialized calling the public method putName, and for one of them the method putHasParent will be called, too. This will result in two Java objects, one for "Bart" and one for "Homer". Additionally, the Bart-object links to the Homer-object.

There is of course a bit more salt in the objects, e.g., Person does really extend from a class called THING, which in turn extends from a class called RDFResource, but this will be handled later. Just to give you a hint: RDFResource provides getters and putters for the object's URI (unified resource identifier). Hence, calling the method getURI on the Bart-object, you receive http://org.dfki/rdf2java/example1#example1_00005, whereas when you call getURI on the Homer-object, you get http://org.dfki/rdf2java/example1#example1_00006. See the RDF file above to make yourself clear why.

What you don't see in the example is, that classes can be sub-classes of another class. As RDFS doesn't provide multiple inheritance, rdf2java doesn't support multiple inheritance either. This remark is for people using Protégé-2000 (see below) to generate and work on RDF/S.

1.2 What rdf2java can do

Opposed to plain RDF, rdf2java distinguishes between, you could say, "near" and "remote" objects. The reason is as follows: RDF data is just a set of triples (subject, predicate, object), which means, that links to objects are always represented by a triple (object, relation, other-object). As soon as this triple exists, the relation exists, no matter whether the linked resource exists or not, the reference to that resource always does.

On the Java-side, this is different. Keeping in mind the simple structure of the class Person above, the referenced Java object must be available, to establish the relation. For example, If the Homer-object is not available (there is no no Java-pointer available), you can not reference the Homer-object, and hence, you can not establish a link (relation) between the two, although there might be such a Homer-object "living" on some other computer or in some other Java-VM (Java virtual machine).

Therefore, rdf2java distinguishes between the two cases: (1) An object is either available in the current Java-VM, then it is an instance of THING; (2) The object is not available in the current Java-VM, then it is represented by an RDFResource, providing at least the URI of the referenced object. A THING is an RDFResource, which means, that also THINGS know about their URIs.

1.3 What rdf2java can not do

We are using Java objects as representatives for RDF subjects / objects. As such a Java object is an instance of some Java class, Java binds the object's Java pointer forever to this object, and furthermore to its class. Hence, the class of a Java object can never change, which means, that the represented RDF subject can never change it's belonging to a class. But this is different from the plain RDF world, where this is possible. However, when you know, that an RDF subject will never change it's class, you won't have any problems using rdf2java.

1.4 rdf2java works well with RDFS produced by Protégé-2000

RDFS unfortunately lacks some property constraints, such as, specifying whether a property can only contain a single value (only one triple with that predicate allowed), or whether there can be multiple values for that property.

We are using Protégé-2000 to create our Models / Meta-Models and file out the result as RDF/S. Protégé-2000 provides some interesing class and property declarations, that we don't want to miss when using rdf2java later on. Therefore rdf2java interpretes some of the Protégé-2000 specific declarations.

RDF files exported via rdf2java's RDF Export funtionality (Java objects --> RDF data) can, of course, be read in by Protégé-2000.

Note: rdf2java does not support multiple inheritance (because Java doesn't either).

If you like, look at some files in the example directory. However, the RDF/S files found there are slightly different than the one's I pasted above. That's because they have been generated using Protégé-2000 and, hence, they contain additional, Protégé-2000 specific modeling. If you have installed Protégé-2000, you can as well open the protege file (example1.pprj) there containing the modeling of the Person class, as well as, the modeling of the two instances for Bart and Homer.

1.5 Online documentation (javadoc) of the rdf2java API

. . . can be found here.

2. Installation

rdf2java is a pure java tool, written using Java 2, JDK 1.3.1_02. Hence, I propose to NOT use an older JDK. However, the sources are included, whereas you could try to recompile the tool with whatever JDK you like. If you are using JBuilder 6 (or later), you will even find a project file "rdf2java.jpx", so you can open it directly into the JBuilder IDE and compile it, debug it, whatever...

Just download rdf2java.zip and extract it to some directory, e.g., C:\java\ (an rdf2java directory will be created there automatically).

Top-level directories:
doc : documentation (javadoc)
import : jar-files needed by rdf2java; these must be included in the CLASSPATH ("java -cp ..."), when using rdf2java.
lib : jar-file of the rdf2java tool; of course this must be included in the CLASSPATH, as well.
src : the complete source code
testdata : some test data for examples

All you need to use rdf2java is to get the CLASSPATH right, whenever starting a Java-VM. You can either set the CLASSPATH environment variable, or specify the classpath temporarily via the "-cp" parameter whenever you call the java interpreter.

lib/rdf2javaApidoc.jar contains the API documentation (javadoc) in a jar-file. It is much more compact that way and could such be stored better in our CVS repository.

All tools and elements included in rdf2java are in the package dfki.rdf.util.

3. Create the Java files from a RDFS file

If you want to use RDF Import / Export functionality for some RDF data, you need a set of Java classes which correspond exactly to the RDF data, which means, that there must be a Java class for each RDF class instantiated in the RDF data. Furthermore, all relations between RDF subjects / objects must correspond to getter and putter methods in the Java classes. For example, if there is a triple (a, b, c) in the RDF data (meaning a --b--> c), the Java object for a must provide the methods getB and putB.

3.1 RDFS2Class (dfki.rdf.util.RDFS2Class)

If there's an RDFS file available specifying the allowed classes and properties (i.e. relations), you can let rdf2java create the needed corresponding Java classes for you. The tool RDFS2Class, which is part of rdf2java, will generate the source code for theses classes out of a given RDFS file.

Look up the batch-file testdata/assign/rdfs2class.bat for an example of how to call the RDFS2Class tool.

Generally, you will have to call RDFS2Class as follows:

java -cp <...CLASSPATH...> dfki.rdf.util.RDFS2Class <...FLAGS...>
                                                    <...RDFS-FILE...>
                                                    <...OUTPUT-SRC-DIR...>
                                                    <...N1...>   <...P1...>
                                                    <...N2...>   <...P2...>
                                                    <...N3...>   <...P3...>
                                                             . . .

whereas

FLAGS:    -q: quiet operation, no output
          -s: include toString()-stuff in generated java-files
          -S: include recursive toString()-stuff in generated java-files
              (used instead of -s)
          -o: retain ordering of triples (usage of rdf:Seq in rdf-file)
              by using arrays instead of sets
          -I: insert stuff for incremental file-generation
              (needed for potential later usage of -i)
              ATTENTION: this option completely re-creates java-files and
              erases every user-defined methods and slots, 
              maybe you'd better use "-i" ?!
          -i: incremental generation of java-files, i.e. user added slots
              are kept in the re-generated java-files
              (this option includes already -I)
          Order of the following options is important!
          rdfs=<namespace>    : set different RDFS namespace
          rdf=<namespace>     : set different RDF namespace
          protege=<namespace> : set different Protege namespace

RDFS-FILE      : the RDFS-file declaring the classes and properties

OUTPUT-SRC-DIR : generated source files (not class files) go to this directory

(N1, P1), ...  : pairs of namespaces and package-names; this specifies how to
                 map namespaces to package-names

We typically use "-is" as FLAGS, which includes a toString method and allows for editing the Java classes without problems.

As RDF/S normally uses namespaces to ensure uniqueness of RDF resources all over the world, rdf2java maps the needed namespaces to package-names. This means, that RDFS2Class will create the corresponding directory structure, as well.

For our small example above, let's assume, you've got an appropriate RDF Schema file in C:\TEMP\example1.rdfs. Then you could call RDFS2Class with the following parameters:

java -cp <...CLASSPATH...> dfki.rdf.util.RDFS2Class
                           -is
                           C:\TEMP\example1.rdfs
                           C:\TEMP\src
                           http://org.dfki/rdf2java/example1#
                                  org.dfki.rdf2java.example1

Please note, the last two(!) parameters. We are used to mapping the namespaces to packages with nearly identical names (slashes become dots), but you are not oblighed to do that. You could even map different namespaces to just one package. However, then you could not export (RDF Export) Java objects to RDF data.

For more information, please look at some files in the example directory. However, the RDF/S files found there are slightly different than the one's I pasted above. That's because they have been generated using Protégé-2000 and, hence, they contain additional, Protégé-2000 specific modeling. Also the generated Java class org.dfki.rdf2java.example1.Person is a bit more complex than the structure snipped above, but after having a look at it, you should roughly get the picture. Besides, although you can, you won't look into these classes very often. They are structures for keeping and representing some RDF data. In most cases, you will do nothing more than just call the getters and putters of these objects.

4. RDF Import / Export (RDF data <--> Java instances)

Look at SimpleImportExport.java to get an understanding of how the import / export works. For a better understanding I pasted an extract of the most important parts of the code below (marginal stuff like, e.g., exception handling has been removed):

import dfki.rdf.util.RDFImport;
import dfki.rdf.util.RDFExport;

import org.dfki.rdf2java.example1.Person;

public class SimpleImportExport
{
    public static void main( String[] args )
    {
        final String NAMESPACE_1 = "http://org.dfki/rdf2java/example1#";
        final String PACKAGE_1   = "org.dfki.rdf2java.example1";

        Map mapNamespace2Package = new HashMap();
        mapNamespace2Package.put( NAMESPACE_1, PACKAGE_1 );
        RDFImport rdfImport = new RDFImport( mapNamespace2Package );

        // 1. import from RDF file "example1.rdf"
        Map mapObjects = rdfImport.importObjects( "testdata/example/example1.rdf" );
        printout( mapObjects );

        // 2. make some changes (or additions)
        //    e.g. add an object for "Lisa Simpson"
        final String NAMESPACE_FOR_NEW_INSTANCES = NAMESPACE_1;
        Person lisa = new Person();
        lisa.makeNewURI( NAMESPACE_FOR_NEW_INSTANCES );
        lisa.putName( "Lisa Simpson" );
        // get object for "Homer"; you must know Homer's URI: "http://...#example1_00006
        Person homer = (Person)mapObjects.get( "http://org.dfki/rdf2java/example1#example1_00006" );
        lisa.putHasParent( homer );
        // add lisa object to mapObjects (so she can be exported in step 3 below)
        mapObjects.put( lisa.getURI(), lisa );
        System.out.println( "\n\n----------\nadded lisa\n----------\n" );
        printout( mapObjects );

        // 3. export to RDF file "example1_lisa.rdf"
        Map mapPackage2Namespace = new HashMap();
        mapPackage2Namespace.put( PACKAGE_1, NAMESPACE_1 );
        RDFExport rdfExport = new RDFExport( NAMESPACE_FOR_NEW_INSTANCES, mapPackage2Namespace );
        rdfExport.exportObjects( mapObjects.values(), "testdata/example/example1_lisa.rdf" );
    }

    public static void printout( Map mapObjects )
    {
        for( Iterator it = mapObjects.keySet().iterator(); it.hasNext(); )
        {
            String sURI = (String)it.next();
            Object obj = mapObjects.get( sURI );
            System.out.println( "\n########## " + sURI + " ##########\n" + obj );
        }
    }
}

5. Knowledge Base (dfki.rdf.util.KnowledgeBase)

The KnowledgeBase class stores and maintains Java objects (representing some RDF data). At first sight, it provides hardly more than a simple data storage class. You could as well do fine with some java.util.Collection. However, the stored Java objects shall represent RDF data, which unfortunately delivers some problems. RDF (Resource Description Framework) is designed to talk about things (resources) with the explicit intention to talk about resources far away, think of all the distributed documents (e.g. web pages) around the world. You can put statements in one document describing another document somewhere else on the planet.

As already mentioned before (see 1.2 What rdf2java can do), rdf2java distinguishes between the two cases: (1) An object is either available in the current Java-VM, then it is an instance of THING; (2) The object is not available in the current Java-VM, then it is represented by an RDFResource, providing at least the URI of the referenced object. A THING is an RDFResource, which means, that also THINGS know about their URIs.

When importing RDF data, you get a set of Java objects. Some of them will maintain links to others, some of them will link to resources not available by Java objects. In the latter case, such a Java object links to an RDFResource.

Now assume, you have imported some RDF data and, hence, you have some Java objects representing this RDF data. When you, later on, want to import another RDF, which has relationships to the RDF you have imported earlier, then you have to take care using all these Java objects. In plain RDF, being hardly more than a set of triples (subject, predicate, object), you just have to add more triples and you're done. In the Java world, however, you've got the distinction between THINGS and RDFResources. When you import new RDF and hence get new Java objects, you have make sure, that all RDFResources pointing to you new Java object are converted to direct Java pointers to the new Java object. But it's even worse: Assume having independently imported two RDF files, meaning you have got two different collections of Java objects. Big problems arise as soon as you try to merge these two. You may say, this problem can be avoided. But think about distributed scenarios, think about agents sending RDF data among themselfs (in our research project FRODO our agents do that). What happens very often is, that agents are sending updates of their data, being RDF data. When using Java objects representating this RDF, you come to the point, where you have to assert new RDF data to old RDF data.

The KnowledgeBase class solves this problem by providing methods for such assertions. As an assertion of an object works on an older version of the same object, these methods have been named assign instead of assert, but maybe we should change this...?

Now, let's come to the semantics of this assign "operator". Assume you have got some objects {ai} in the knowledge base and want to assign {bj}. Then, first of all, this is identical to an iterative assignment of all bj individually. The assignment {ai} <- bj works according to the following rules:

1. case: there is an ak with ak.getURI() = bj.getURI():
This more or less means: bj is already in the knowledge base (that's ak), but, the bj assigned can contain other values and/or links.
Therefore ak stays in the knowledge base, but all it's properties are getting updated. Assume, ak has got properties p1(ak),...,pn(ak)with values v1(ak),...,vn(ak);analogously bj has values v1(bj),...,vn(bj),then proceed for all m=1,...,n in the following way:
- case 1.a: pm is a single value slot: vm(ak) <- vm(bj), i.e., take the newer value (this may be even no value, which empties the property).
- case 1.b: pm is a multiple value slot and the values are strings (literals): just take all the values from vm(bj), because that are the new ones.
- case 1.c: pm is a multiple value slot and the values are links to resources (other objects): this is the most interesting case...
  First of all, all values in vm(ak) not contained in vm(bj) are removed. Then, for the rest of the values (i.e. values contained in both) check:
  - IF both, vm(ak) and vm(bj) are THINGS, THEN assign vm(ak) <- vm(bj); this leads to a recursion, so, the assignment is a quite inefficient procedure!
  - OTHERWISE just take out vm(ak) and take in vm(bj). Exactly this "take in" and "take out" has (by now) one great disadvantage: It may ruin the order of the multiple values. For normal RDF this causes no trouble, as there is no specification about the order of the triples, but as soon as one uses ordering stuff like rdf:seq, there's problems ahead! Although rdf2java supports ordering of multiple values, we aren't using this, because it's quite against the idea of RDF. If we need ordering, we model it explicitly through explicit structures.
- 2. case: there exists no such k with ak.getURI() = bj.getURI():
  This is, of course, the easiest case: As bj is just a new object, we put it into the knowledge base and we're done.

After this assignment procedure, there's a second pass done, which updates values being only references (RDFResource), although a THING is now available. These RDFResource values are then converted to Java pointers pointing to the respective THING. This is done by method updateRDFResourceSlots, which is automatically called in the assign method (in the KnowledgeBase class).

Examples

We change the old example slightly, as the old example lacks interesting cases. Therefore, we add an a inverse property to hasParent, namely hasChild. Furthermore we have to add the inverse values for this slot, as this isn't done automatically. The result is example2.* in the example directory.

Assume, our knowledge base contains Java objects corresponding to RDF data declaring Bart and Homer Simpson and their relationships to each other (properties hasParent and hasChild). The RDF graph for this is shown on the left image after this paragraph (namespaces have been abreviated). Assume further we want to assign a new object for Lisa Simpson plus relevant relationships to this knowledge base. You see the graphical representation of the RDF data to assign on the right image below. We will see below, the assignment won't lead to the result you might expect. Later we will show the correct RDF data to assign...

http://www.dfki.uni-kl.de/~schwarz/rdf2java/testdata/example/example2.a.short.75.gif
the knowledge base so far

http://www.dfki.uni-kl.de/~schwarz/rdf2java/testdata/example/example2.b.short.gif
RDF data to assign to the knowledge base

When looking at these pictures, keep in mind, that they are nothing more than the graphical representation of RDF triples; boxes represent RDF resources, ellipses represent literals (strings). Via rdf2java, resources (boxes) for instances of RDFS classes are represented by corresponding Java objects in the knowledge base, and outgoing edges are stored in these Java objects and are available via getter methods.

Hence, before the assignment, the knowledge base (left pictures) contains two Java objects: one for Bart (example1:example1_00005) and one for Homer (example1:example1_00006). Furthermore, both objects store the respective value for the name property. Finally the Bart object stores a link to the Homer object for the hasParent property, and the Homer object stores the inverse link for the hasChild property, respectively. Note, that there's no object for example1:Person as this is a class and no instance.

Analogously, the RDF data to assign (right image above) will be represented (e.g. after RDF Import) by two objects, one for Lisa's resource plus outgoing edges (properties name and hasParent) and one for Homer's resource plus outgoing edges (property hasChild). Note, that this Homer object is neither the same object as the one in the knowledge base, nor has it identical values; That Homer object, e.g., doesn't have a value for the name property, whereas the Homer object in the knowledge base has.

When assigning the RDF data (right image above), both of the two objects (Lisa and Homer) have to be assigned. The assignment will take place in two steps (see above): (1) the main assignment procedure, (2) updating all the property values linking to an RDFResource, which after the assignment is available as a THING. The first step of assigning Lisa's object is easy, as it doesn't yet exist in the knowledge base. Therefore, the object will bluntly be added (including inthere all information about outgoing edges) to the knowledge base. Assigning the Homer object, or to be more precise: the RDF data stored in the Homer object, is a bit more complicated. First of all, there exists already a Homer object in the knowledge base. Hence, the object can not be simply added, but must be assigned to the already existing one. Therefore, all Homer properties have to be examined. name: left-side Homer stores a value, right-side Homer doesn't. Possible interpretation: right-side lacks (leaves out) this information? No! Our interpretation is: the right-side Homer is the newer one and contains all new data. Hence, the value for the name property will be deleted on the left-side Homer object. hasChild: both Homer objects have different (disjoint) values for that property, so we take the new ones, i.e., the ones at the right side. The result (knowledge base) is as follows:

http://www.dfki.uni-kl.de/~schwarz/rdf2java/testdata/example/example2.c.short.75.gif

The result is not quite the one, you might have expected when I started this example. There are several mistakes: The Bart object forgot about his name. Then it still "knows" about his parent, but the Homer object doesn't reflect this relationship. As RDF/S doesn't know anything about inverse slots, this is this correct RDF. Finally it's not quite what we wanted to receive. The problem arose right at the beginning, where we specified, what RDF data to assign to the knowledge base. Let's make another assignment:

http://www.dfki.uni-kl.de/~schwarz/rdf2java/testdata/example/example2.a.short.75.gif
the knowledge base so far

http://www.dfki.uni-kl.de/~schwarz/rdf2java/testdata/example/example2.b2.short.75.gif
RDF data to assign to the knowledge base

Note, that example1:example1_00005 (on the right image) will only be represented by an RDFResource object, because the right-side Homer object just needs to links to a reference of this resource. There's no need to know about about or change this resource. The Homer object has to be changed because of the hasChild property. Hence, we need to represent that by a THING object and add / change the property values we like. However, the result of this assignment will be as follows:

http://www.dfki.uni-kl.de/~schwarz/rdf2java/testdata/example/example2.d.short.75.gif

6. Additional Stuff

In addition to really needed stuff like, e.g., the KnowledgeBase class, other useful tools emerged naturally.

6.1 RDF Diff (dfki.rdf.util.RDFDiff)

You often come to the point where to would like to test the difference between two RDF files. In contrast to XML, we don't have to come up with philosophical questions about equality of RDF things, because (1) we've got only triples here, the correspondation to some RDFS is the only constraint to the triples; (2) each resource you want to talk about, i.e, each RDF subject oder object has a unique resource identificator (URI). Hence, equality can be reduced to the triple level. Therefore, the difference between two RDF files is just the set difference of both triple sets.

A call to "RDFDiff <abc.rdf> <xyz.rdf>" will result in two additional files: the newly created file abc.rdf.diff.rdf will contain all statements of abc.rdf minus the ones in xyz.rdf. Analogously, a file xyz.rdf.diff.rdf is created, too.

6.2 RDF Nice (dfki.rdf.util.RDFNice)

Despite the fact, that RDF is generally nothing but a graph, typical RDF data often has hierarchical structure(s). And as the typical form of serialization of RDF is XML, which explicitly provides hierarchy, it is just a pity, that we aren't using the hierarchical presentation althought it's useful. Well, the reason for this is, of course, that you can't ever present RDF hierarchical, generally. But: In non-general, say, typical, scenarios, your RDF is nearly always structured hierarchically to the max. That's why I wrote this small tool.

The idea behind RDF Nice is to give the XML serializer some hints about which predicate is a hierarchy-driving relation. You do this by assigning some predicates positive or negative numbers. If you assign a positive number to a predicate, this will indicate, that following this predicate will go DOWN the hierarchy; negative numbers indicate, that following the predicate is UP the hierarchy. You can prioritize these UP/DOWN indications by using higher numbers (+100 is stronger than +1, -10 is also stronger than +1, but weaker than +100). At the moment, you have to cope with integer numbers, a lexical ordering would, of course, be better... Note, that -10 is only ten times stronger than +1, which means, that eleven predicate of type +1 outperform one of type -10!

So, what's happening with all that numbers?

First of all, RDF Nice puts all RDF subjects in a bag. Iteratively, subjects are being drawn out of this bag and are attached to some XML branch.
Every time, a subject has to be chosen, to be taken out of the bag, RDF Nice calculates for each residuary subject a value. This value results from summing the values of all outgoing predicates plus the negative(!) sum of all incoming predicates. Then the subject with the highest value is taken.
After the XML element has been created, the outgoing predicates together with the objects, these predicates are pointing to, are attached below the subject node. The ordering of these predicates will be according to there value, i.e., the predicate with the highest value will be the first child of the subject node.
Whenever a predicate points to some subject still in the bag, the algorithm will go into recursion and that way create the hierarchical presentation. If the predicate points to some subject not in the bag, this means, either that subject is not part of the RDF data, or that subject has already attached elsewhere on the XML tree. Anyhow, in that case, the subject is not serialized again, but just referenced in the RDF typical manner.

Whatever values you assign to predicates, the resulting RDF will always stay the same, just the presentation changes. However, I wouldn't swear to RDF Diff being bug-less. ;-)

Call RDF Diff the following way to get a "nice" version of the RDF file for our small "Person" example:

java -cp <...CLASSPATH...> dfki.rdf.util.RDFNice
                           C:\TEMP\example1.rdf
                           http://org.dfki/rdf2java/example1#hasParent -1000
                           http://org.dfki/rdf2java/example1#name      1

So, the first parameter is the RDF file, followed by arbitrarily many pairs(!) of parameters, namely one for the predicate (the whole URI, not only the local name), and one for the value you want to assign to that predicate. RDF Nice produces another file (so no original RDF file will be harmed) with the same path and filename plus postfix ".nice.rdf", i.e., "example1.rdf.nice.rdf" in our example. Doing this, you get the following RDF, which is not quite impresing, but that's because the simple "Person" example is a bad one. The only hierarchical relation is pointing into the wrong direction, namely upwards, but XML serialization is always about the other direction, namely downwards. Note, that the predicate name comes before hasParent, because its value is greater (greater, not stronger!).

<?xml version='1.0' encoding='ISO-8859-1'?>
<!DOCTYPE rdf:RDF [
    <!ENTITY example1 'http://org.dfki/rdf2java/example1#'>
    <!ENTITY rdf 'http://www.w3.org/1999/02/22-rdf-syntax-ns#'>
    <!ENTITY rdfs 'http://www.w3.org/TR/1999/PR-rdf-schema-19990303#'>

]>
<rdf:RDF 
    xmlns:example1="&example1;"
    xmlns:rdf="&rdf;"
    xmlns:rdfs="&rdfs;">

    <example1:Person  rdf:about="&example1;example1_00006"
         example1:name="Homer Simpson"/>
    <example1:Person  rdf:about="&example1;example1_00005"
         example1:name="Bart Simpson">
        <example1:hasParent  rdf:resource="&example1;example1_00006"/>
    </example1:Person>
</rdf:RDF>

To get a better picture of RDF Nice, we switch to example2.* in the example directory. To rememver, we added a new property to the RDF Schema: the property hasChild being the inverse property to hasParent. Changing the call to RDF Nice (see rdfnice.bat) taking into account the new property (hasChildis assigned the negative value of hasParent: +1000), we receive the following XML serialization for example2_lisa.rdf (including Lisa):

<?xml version='1.0' encoding='ISO-8859-1'?>
<!DOCTYPE rdf:RDF [
    <!ENTITY example1 'http://org.dfki/rdf2java/example1#'>
    <!ENTITY rdf 'http://www.w3.org/1999/02/22-rdf-syntax-ns#'>
    <!ENTITY rdfs 'http://www.w3.org/TR/1999/PR-rdf-schema-19990303#'>

]>
<rdf:RDF 
    xmlns:example1="&example1;"
    xmlns:rdf="&rdf;"
    xmlns:rdfs="&rdfs;">

    <example1:Person  rdf:about="&example1;example1_00006"
         example1:name="Homer Simpson">
        <example1:hasChild>
            <example1:Person  rdf:about="&example1;example1_00005"
                 example1:name="Bart Simpson">
                <example1:hasParent  rdf:resource="&example1;example1_00006"/>
            </example1:Person>
        </example1:hasChild>
        <example1:hasChild>
            <example1:Person  rdf:about="&example1;example2_00007"
                 example1:name="Lisa Simpson">
                <example1:hasParent  rdf:resource="&example1;example1_00006"/>
            </example1:Person>
        </example1:hasChild>
    </example1:Person>
</rdf:RDF>

6.3 RDF Dump (dfki.rdf.util.RDFDump)

Nothing much to say about this tool. It just load in an RDF file and dumps out the triples to System.out. The triples are output in pure ASCII and follow the following format: subject \t predicate \t object \n. So, TAB and NEWLINE are the delimiters; there's no SPACE, no bracket, no comma, nor anything else. Look and edit the Java code if you're longing for more...

The End

For a complete list of local wiki pages, see TitleIndex.

Download in other formats:

Plain Text