Platform Agnostic Depiction of a Data Product -- Juha Korpela

Generating an FAQ and Defined Terms Knowledge Graph from a LinkedIn Post

Created on 2024-02-23 20:01

Published on 2024-02-23 20:23

Situation

I stumbled upon an interesting LinkedIn post, courtesy of Kurt Cagle, about the term ‘Data Product,’ authored by Juha Korpela.

Naturally, I appreciate the essence of the post, which emphasizes the importance of defining terminology in a platform-agnostic manner. As I read further, it became clear that this thread also serves as an excellent demonstration of how LLM-based tools, like ChatGPT, combined with a SPARQL-compliant backend (e.g., our Virtuoso Platform), can collectively address a number of data-related challenges such as:

  1. Transformation from text to an entity-relationship graph.

  2. Exporting the entity-relationship graph to a platform capable of manifesting a Knowledge Graph, i.e., a knowledge base that leverages Linked Data principles for Data Access, Integration, and Management.

  3. Incorporating attribution as part of provenance metadata into the Knowledge Graph.

  4. Providing SPARQL, SQL, or GraphQL access for additional querying using tools that support those query languages.

How-To Guide

I addressed the issues mentioned above through a series of steps, leveraging a suite of powerful tools: the OpenLink Structured Data Sniffer (OSDS) browser extension, OpenLink Personal Assistant (OPAL) (a ChatGPT-based Smart Agent), and our versatile Virtuoso platform, which serves as a multi-model (encompassing tables and entity-relationship graphs) and multipurpose platform (integrating middleware, DBMS, file, and application server functionalities).

[1] Open up the post directly by clicking on the following URL: https://www.linkedin.com/feed/update/urn:li:activity:7166756269079289856.

[2] Click on the OSDS “doggie” icon, then click on the “GPT” action icon.

[3] Alter the prompt accordingly, since in this case, I wanted to generate both an FAQ and a Defined Terms Set (or Concept Scheme).

[4] Send the prompt to ChatGPT for processing.

[5] Iteratively refine the response if needed, e.g., if it contains syntax errors.

ChatGPT Prompt captured via OpenLink Personal Assistant (OPAL)

[6] Use the structured data dropdown (inserted by OSDS) to visualize the generated entity-relationship graph, using a property-sheet layout.

OpenLink Structured Data Sniffer (OSDS) visualizing JSON-LD generated by ChatGPT

[7] Upload the entity-relationship graph to a SPARQL-compliant backend (e.g., the Virtuoso RDBMS instance behind our publicly available URIBurner service).

OSDS Preparing to export data to a Virtuoso instance via its SPARQL endpoint

[8] Explore the generated Knowledge Graph by clicking on links presented on the SPARQL Query Results page generated automatically in response to the entity-relationship graph upload. For example, this SPARQL results page that offers a sampling of entities and entity types for further exploration.

SPARQL Query Results Page that presents Knowledge Graph Exploration Entry Point

[9] Follow-your-nose, by clicking on values in the Sample or EntityType columns to discover related data, information, and knowledge. For example, this page about the generated Defined Terms Set.

Defined Term Set Description from generated Knowledge Graph

Conclusion

We are in the midst of a massive inflection that will redefine the software industry as we know it. Challenges associated with structured data construction and management are tremendously reduced by recent AI and Generative AI innovations. Decision Makers, Users, Architects, and Developers are now able to achieve previously unattainable productivity levels without being held captive from the get-go by any particular platform—bar its credentials as an appropriate solution for the task at hand.

Tools Used

Related