Data Wrangling: Generating a U.S. Presidential Election Knowledge Graph from a Google Spreadsheet

Created on 2020-11-03 14:57

Published on 2020-11-03 15:02

Arising from curiosity about the potential outcome of the 2020 US Presidential Election, I’ve constructed a spreadsheet comprising data about eligible voters in said election, a breakdown of early-votes cast by state, and the final tallies for the 2016 election.

I am interested in states that show very high early-vote counts as a percentage of the total vote count in the prior 2016 Presidential Election.

Data Wrangling Goal?

I want to generate an explorable Knowledge Graph from my Google Spreadsheet which itself comprises a merge of data from various sources that include:

I will use the Virtuoso instance behind the public URIBurner Service (which includes an enabled Sponger Transformation Middleware Module), as my tool of choice. 

Steps?

[1] Create Google Spreadsheet. Here’s my example. 

No alt text provided for this image

[2] Pass the Spreadsheet URL to URIBurner using options for returning CSV rather than HTML — https://linkeddata.uriburner.com/about/html/https/docs.google.com/spreadsheets/d/1PTqUkqv-9BPWY1V1cFq13xR92EL6vJ9b8N4Ox3TUagE/gviz/tq?tqx=out:csv&range=A2:O53&sheet=2020_Election_Analysis

That returns an HTML-Entity Description Page which is also an entry point into a newly generated Knowledge Graph. This pages includes a listing of Entities derived from Rows in the original Spreadsheet. 

No alt text provided for this image

For instance, clicking on the hyperlink “Record2” results in a lookup that returns data for the state of Texas.

No alt text provided for this image

[3] Faceted Browsing View — http://linkeddata.uriburner.com/describe/?url=https%3A%2F%2Flinkeddata.uriburner.com%2Fabout%2Fid%2Fentity%2Fhttps%2Fdocs.google.com%2Fspreadsheets%2Fd%2F1PTqUkqv-9BPWY1V1cFq13xR92EL6vJ9b8N4Ox3TUagE%2Fgviz%2Ftq%3Ftqx%3Dout%3Acsv%26range%3DA2%3AO53%26sheet%3D2020_Election_Analysis&distinct=1

Alternatively, you can click on the “Browse using” drop-down or manually construct an alternative URL-pattern to obtain a different HTML-based Entity Description page that includes powerful Faceted Browsing capability i.e., the use of Entity Attributes to provide pivot-style exploration across various dimensions.

For instance, here is a list of instances of the Class "2020 Election Analysis" derived from the Google Spreadsheet.

No alt text provided for this image

Here's a view of a specific Record, a specific instance of the "2020 Election Analysis" Class derived from the Google Spreadsheet.

No alt text provided for this image

Here's the Ontology derived from the Google Spreadsheet.

No alt text provided for this image

A Little Reasoning & Inference

At this point, I have two views of my Knowledge Graph, but the labelling for each state could be clearer. 

To solve the preferred labelling problem, I’ve applied a SPARQL INSERT statement to generate an additional attribute that will ultimately be used as my preferred label, courtesy of the skos:prefLabel term from the SKOS Ontology.

PREFIX lod:                 <http://lod.openlinksw.com/>
PREFIX rdfs:                <http://www.w3.org/2000/01/rdf-schema#>
PREFIX schema:              <http://schema.org/>
PREFIX owl:                 <http://www.w3.org/2002/07/owl#>
PREFIX skos:                <http://www.w3.org/2004/02/skos/core#>
PREFIX election-2020-data:  <https://docs.google.com/spreadsheets/d/1PTqUkqv-9BPWY1V1cFq13xR92EL6vJ9b8N4Ox3TUagE/gviz/tq?tqx=out:csv&range=A2:O53&sheet=2020_Election_Analysis#>
INSERT {
         GRAPH <urn:us:election:2020:data:cleanup>
           { ?record skos:prefLabel ?label }
       }
WHERE { ?record election-2020-data:State_Name ?label }

The following screenshots depict effects of the INSERT statement above i.e., how the underlying Faceted Browsing Engine has selected the preferred Entity Attribute for Display Labeling used in the “About:” heading. 

No alt text provided for this image

And here's a revamped view of "Texas" rather than "Record2" .

No alt text provided for this image

Conclusion

Courtesy of the data management and transformation power of Virtuoso, I’ve successfully used a single mouse-click to generate a Knowledge Graph deployed using Linked Data principles from my Google Spreadsheet — comprising data about the U.S. Presidential Election collated from a variety of sources.

Related