Data Virtualization, Lakes, Semantics, and Security

Created on 2016-05-24 16:16

Published on 2016-05-24 17:29

The mantra of any good security engineer is: ‘Security is a not a product, but a process.’ It’s more than designing strong cryptography into a system; it’s designing the entire system such that all security measures, including cryptography, work together. — Bruce Schneier

As is too often the case, data security remains a low priority item at product selection all the way through project inception . Today, we have Data Virtualization atop disparate data sources having a resurgence in mindshare without equal concern being paid to data access related security concerns.

Challenge

We have a plethora of software applications (or apps) in use, across a variety of computing devices, all equipped with connectivity to enterprise and personal databases. Thus, how would one create practical and scalable solutions to the obvious challenges this presents to data security? 

Solution

Use entity relationship graphs (networks, webs, or clouds) to represent how all the actors (users and applications) involved in the data access process are related — and to a level of granularity that also includes the semantics of their associations.

Here's an illustration of how Virtuoso uses entity relationships and their semantics  in regards to secure data virtualization over heterogeneous data sources (or Data Lakes).

By separating — rather than conflating — the identity of users and applications (software agents) and then expressing the relationship types semantics that exist between users and their applications, you end up with a scalable solution for creating data access policies, using existing open standards (URIs, HTTP, X.509, TLS, RDF, and First-order Logic), that’s only constrained by imagination as opposed to architectural myopia.

Process & Component Breakdown 

  1. Every data access policy boils down to an collection of RDF Language sentences that describe said policy. 
  2. Every software application is identified by an X.509 certificate which includes a Hyperlink (HTTP URI) that identifies said software application while also resolving to a software application oriented profile document
  3. Every user is identified by a Hyperlink (HTTP URI) that resolves to a personal profile document 
  4. The semantics of the relationship type that associates a user and one or more software applications is reflected in the profile documents of both entities
  5. Authentication is scoped to applications rather than users but cognizant of how a given application and its associated users are related 
  6. Actual data access is controlled by policy evaluation scoped to the user rather than application -- of course, one can construct specific policies that are scoped to a combination of application and user. 

It's important to note that this approach works with existing open standards that collectively drive the current Internet & Web.

It's also important to note that only applications (not users) require X.509 certificates which ensures the following benefits:

  1. X.509 Certificate Management Tedium isn't a factor 
  2. TLS implementation across popular Software Applications like Web Browsers doesn't introduce challenging UI/UX impedance [e.g., browser restarts per user identity change/switch] -- since actual user identity is tested by ACLs while at authentication time its delegated to the application combined with application<->relationship semantics. 

Live Examples

  1. http://tinyurl.com/zbcqvfz -- SPARQL Query Results page where the query targets entity relationships in a protected database (a/k/a Named Graph or Document) that's only accessible to specific users identified by a WebID (HTTP URI or Hyperlink that identifies a Person, Organization, or Software Agent) and authenticated using the WebID+TLS or WebID+TLS+Delegation protocols
  2. http://tinyurl.com/hpphdq8 -- SPARQL Query Results page where the query targets entity relationships in a protected database (a/k/a Named Graph or Document) that's only accessible to users authenticated using any of the protocols presented in the Virtuoso Authentication Dialog
  3. http://tinyurl.com/hj9rjeq  -- Faceted Browser page where the query targets entity relationships in a protected database (a/k/a Named Graph or Document) that's only accessible to specific users identified by a WebID (HTTP URI or Hyperlink that identifies a Person, Organization, or Software Agent) and authenticated using the WebID+TLS or WebID+TLS+Delegation protocols

  4. http://tinyurl.com/hss58dw -- Faceted Browser page where the query targets entity relationships in a protected database (a/k/a Named Graph or Document) that's only accessible to users authenticated using any of the protocols presented in the Virtuoso Authentication Dialog .