Web Data Commons - RDFa, Microdata, and Microformats Data Sets - December 2014  

A(n) schema:WebPage, within Data Space : kingsley.idehen.net associated with source document(s)

This document provides statistics about the Web Data Commons RDFa, Microdata and Microformats data sets which have been extracted from the December 2014 release of the Common Crawl.

Attributes
Values
type
Date Modified
label
  • Web Data Commons - RDFa, Microdata, and Microformats Data Sets - December 2014
comment
  • This document provides statistics about the Web Data Commons RDFa, Microdata and Microformats data sets which have been extracted from the December 2014 release of the Common Crawl.
SeeAlso
Description
  • In summary, this project reports on the discovery of structured data within 620 million HTML pages out of the 2.01 billion pages contained in the crawl (30%). These pages originate from 2.72 million different pay-level-domains out of the 15.68 million pay-level-domains covered by the crawl (17%). Altogether, the extracted data sets consist of 20.48 billion RDF quads. Instructions on how to download the RDFa, Microdata, and Microformats data sets are given on the page how to get the data.
Format
  • text/html
about
mentions
xhv:related
https://twitter.com/hashtag/ht#this
is about of

Alternative Linked Data Documents: PivotViewer | iSPARQL | ODE    About   
This material is Open Knowledge   W3C Semantic Web Technology [RDF Data] Valid XHTML + RDFa