Home

Contents
Preface

RDF Database and SPARQL

Data Representation
IRI_ID Type RDF_QUAD and other tables Short, Long and SQL Values Special Cases and XML Schema Compatibility SQL Compiler Support - QUIETCAST option
RDF and SPARQL API and SQL
SPARQL Implementation Extent
SPARQL Protocol End Point

14.1. Data Representation

This section covers how Virtuoso stores RDF triples. The IRI_ID built-in data type is introduced, along with the default table structures used for triple persistency.

14.1.1. IRI_ID Type

The central notion of RDF is the IRI, or URI , which serves as the globally unique label of named nodes. The subject and predicate of a triple are always IRI's and the object may be an IRI or any other XML Schema scalar data type. In any case, an IRI is always distinct from any instance of any other data type.

Virtuoso supports a native IRI_ID data type, internally an unsigned 32 bit integer value. When compared with other IRI's, the collation is as with unsigned 32 bit integers. An IRI_ID is never equal to any instance of any other type.

Thus, the object column of a table storing triples can be declared as ANY and IRI values will be distinguishable without recourse to any extra flag and IRI's will naturally occupy their own contiguous segment in the ANY type collation sequence. Indices can be defined over such columns. An IRI_ID is never automatically cast into any other type nor any other type into IRI_ID.

The functions iri_id_num (in i IRI_ID) and iri_id_from_num (in n INT) convert between signed 32 bit integers and IRI_ID's. The function isiri_id (in i any) returns nonzero if the argument is of type IRI_ID, zero otherwise.

The syntax for an IRI_ID literal is

#i<nnn>, where nnn is up to 10 decimal digits.
#i12345 is equal to iri_id_from_num (12345)

When received by a SQL client application, the ODBC driver or interactive SQL will bind an IRI_ID to a character buffer, producing the #innn syntax. When passing IRI_ID's from a client, one can pass an integer and use the iri_id_from_num () function in the statement to convert server side. A SQL client will normally not be exposed to IRI_ID's since the SPARQL implementation returns IRI's in their text form, not as internal id's. These will however be seen if reading the internal tables directly.


14.1.2. RDF_QUAD and other tables

The main tables of the default RDF storage system are:

create table DB.DBA.RDF_QUAD (
  G IRI_ID,
  S IRI_ID,
  P IRI_ID,
  O any,
  primary key (G,S,P,O) );
create unique index RDF_QUAD_PGOS on DB.DBA.RDF_QUAD (P, G, O, S);

Each triple is represented by one row in RDF_QUAD. The columns represent the graph, subject, predicate and object. The IRI_ID type columns reference RDF_URL, which translates the internal id to the external name of the IRI. The O column is of type ANY. If the O value is a non-string scalar, such as a number or date or IRI, it is stored in its native binary representation. If it is a string, it will be stored in a 'short' form, meaning a packed binary string with fixed fields for the data type, language, the content and a possible reference to RDF_OBJ if the string is too long to be held in-line in this table.

create table DB.DBA.RDF_URL (
    RU_IID IRI_ID not null primary key,
    RU_QNAME varchar );
create unique index RU_QNAME on DB.DBA.RDF_URL (RU_QNAME);

This is simply a mapping between internal IRI id's and their external form.

create table DB.DBA.RDF_OBJ (
    RO_ID integer primary key,
    RO_VAL varchar,
    RO_LONG long varchar );

When an O value of RDF_QUAD is longer than a certain limit, the string is stored in this table. Depending on the length of the value, it goes into the varchar or the long varchar column. The RO_ID is contained in a fixed position in the string that is stored in the O column. Still, the truncated value of O can be used for determining equality and range matching, even even if < and > of closely matching values need to look at the real string in RDF_OBJ.

create table DB.DBA.RDF_DATATYPE (
    RDT_IID IRI_ID not null primary key,
    RDT_TWOBYTE integer not null unique,
    RDT_QNAME varchar );

The XML Schema data type of a typed string O represented as 2 bytes in the O varchar value. This table maps this into the broader IRI space where the type URI is given an IRI number.

create table DB.DBA.RDF_LANGUAGE (
    RL_ID varchar not null primary key,
    RL_TWOBYTE integer not null unique );

The varchar representation of a O which is a string with language has a two byte field for language. This table maps the short integer language id to the real language name such as 'en', 'en-US' or 'x-any'.

Note that unlike datatype names, language names are not URIs.

A short integer value can be used in both RDF_DATATYPE and RDF_LANGUAGE tables for two different purposes. E.g. an integer 257 is for 'unspecified datatype' as well as for 'unspecified language'.


14.1.3. Short, Long and SQL Values

When processing an O, the SPARQL implementation may have it in one of three internal formats, called 'valmodes'. The below cases apply for strings:

The short format is the format where an O is stored in RDF_QUAD.

The long value is a SQL vector of five fields:

The SQL value is the string as a narrow string representing the UTF8 encoding of the value, stripped of data type and language tag.

The SQL form of an IRI is the string from RU_QNAME. The long and short forms are the IRI_ID referencing RU_IRI_ID of RDF_URL.

For all non-string, non-IRI types, the short, long and SQL values are the same SQL scalar of the appropriate native SQL type. A SQL host variable meant to receive an O should be of the ANY type.

The SPARQL implementation will usually translate results to the SQL format before returning them. Internally, it uses the shortest possible form suited to the operation. For equalities and joining, the short form is always good. For range comparisons, the long form is needed etc. For arithmetic, all three forms will do since the arguments are expected to be numbers which are stored as their binary selves in O, thus the O column unaltered and uncast will do as an argument of arithmetic or numeric comparison with, say, SQL literal constsnt.


14.1.4. Special Cases and XML Schema Compatibility

We note that since we store numbers as the equivalent SQL binary type, we do not preserve the distinction of byte, boolean etc. These all become integer. If preserving such detail is for some reason important, then storage as a typed string is possible but is not done at present for reasons of compactness and performance.


14.1.5. SQL Compiler Support - QUIETCAST option

The type cast behaviors of SQL and SPARQL are different. SQL will generally signal an error when an automatic cast fails. For example, a string can be compared to a date column if the string can be parsed as a date but otherwise the comparison should signal an error. In SPARQL , such situations are supposed to silently fail. Generally, SPARQL is much more relaxed with respect to data types.

These differences will be specially noticed if actual SQL data is processed with SPARQL via some sort of schema mapping translating references to triples into native tables and columns.

Also, even when dealing with the triple-oriented RDF_QUAD table, there are cases of joining between S and O such that the O can be a heterogenous set of IRI's and other data whereas the S is always an IRI. The non-IRI to IRI comparison should not give cast errors but should silently fail. Also, in order to keep queries simple and easily optimizable, it should not be necessary to introduce extra predicates for testing if the O is n IRI before comparing with the S.

Due to these considerations, Virtuoso introduces a SQL statement option called QUIETCAST. When given in the OPTION clause of a SELECT, it switches to silent fail mode for automatic type casting.

The syntax is as follows:

select ... from .... option (QUIETCAST)

This option is automatically added by the SPARQL to SQL translator. The scope is the enclosing procedure body.