Created on 2021-06-18 00:15
Published on 2021-06-18 17:53
It's the year 2021, and it seems as though there has been little to no progress in the general quest to alleviate those costly challenges associated with integrating data across disparate data sources. Fundamentally, this problem lies at the very core of Technical Debt accumulated due to the dominance of Application Centricity over Data Centricity, with regards to solution architecture.
Enterprises and individuals alike are still asking the same age-old question:
What sits between me and my data?
Data Source Names (DSNs) are what sit between you and your data — regardless of application type or usage medium. They bring the power of naming to the realm of software applications.
In computing, the functionality we associate with names is delivered via pointers, a fundamental concept in computing.
Pointers are used to store the location (or address) of data rather than the data itself — thereby enabling “data access by reference,” courtesy of identifier → address indirection.
Similar indirection underlies the powerful effects enabled through identification by name — i.e., denotation [identity] → connotation [description] indirection — which is the process triggered whenever something is mentioned or referred to by its name, in any medium.
In the realm of software application development, the term “DSN” is strongly associated with Open Database Connectivity (ODBC), an open standard from Microsoft that moved the industry away from DBMS-specific Application Programming Interfaces (APIs) to a DBMS-agnostic alternative. Eventually, use of this term extended to ODBC’s Java-specific variant, now well known as Java Database Connectivity (JDBC).
DSN=CRM;HOST=example.org;DATABASE=CRM
A DSN denotes an ODBC-accessible Database managed by a SQL compliant RDBMS.
jdbc:openlink://example.org/DATABASE=CRM
A DSN denotes a JDBC-accessible Database managed by a SQL compliant RDBMS.
Despite the usefulness of both ODBC and JDBC, each is limited to offering programming-language–specific implementations of DSNs — ODBC is fundamentally a C/C++ API, while JDBC is a Java API — both falling short whenever platform independence is a requirement.
Hyperlinks, especially those based on the HTTP protocol, provide a platform-agnostic implementation of the DSN concept by way of an abstraction that encapsulates the Domain Name Services (DNS) and TCP/IP protocols. In a nutshell, you can use HTTP to create a name for anything, letting you identify things (entities) unambiguously while also being able to look-up their descriptions, which may be available in a variety of document types.
This variant of DSN comes in two forms, URIs and IRIs, differentiated by their respective use of ASCII and Unicode character sets.
Linked Data Principles provide a straightforward approach to structured data representation along the following lines:
These principles unleash the power of Hyperlinks as DSNs by delivering the following benefits:
Here are two live examples of Hyperlink-based DSNs that demonstrate the effects of these principles.
In both cases, the DSNs resolve to a Knowledge Graph deployed via HTML by default; an HTTP User Agent (such as a Web Browser) can negotiate representation of the same Knowledge Graph using alternative document media types including JSON, JSON-LD, RDF-Turtle, and others.
Hypertext, Hyperdata, and Knowledge Graphs Illustrated
Finally, bringing everything together, here’s a YouTube video that demonstrates the effects of using Hyperlink-based DSNs for Knowledge Graph construction.
Heterogeneous Data Access, Integration, Interoperability, and Management have challenged the computer industry since its inception. As outlined and demonstrated above, using Hyperlinks as Data Source Names (DSNs) provides a uniquely scalable and cost-effective solution to this age-old problem.