Ctrl + All Tooling Overview
This note describes the tooling used by the Ctrl + All. Computing the Social project. The tooling is used for:
- Organization of historical data, their conversion to modern formats, annotatios and metadata.
- Collecting of plain-text notes as well as resources in other formats (e.g. PDFs) that are relevant to research.
and provides:
- User-interface for exploring data and notes.
- Export of data to machine-readable formats (RDF) as well as endpoints for doing semantic queries (SPARQL).
Knowledge Graph
The Ctrl + All Knowledge Graph consists of three types of data:
- Binary artifacts: Examples include NIPS files
- RDF: Structured and semantic data about binary artifacts, data converted from binary artifacts, annotations and metadata.
- Notes: Plain-text notes as well as other supporting resources (such as PDFs)
Importing Data into the Knowledge Graph
Data can be imported into the knowledge graph using the ctrlall import
command:
$ python -m ctrlall import ctrlall.ttl vocab/*.ttl
urn:eris:BIAFQJFTQIAGSPBJ7XMRNHOBWA42UUAS5CCHU24U6FF43C5TADJSHDAN4KTTAPONQX62ES4XKWC6PNW5KXCBFLRIETMDS6KKWOVEN564GQ
file:///vocab/dcat__vocab.ttl
file:///vocab/discovery__vocab.ttl
file:///vocab/hes-bmf__vocab.ttl
file:///vocab/nipsv__vocab.ttl
file:///vocab/prov__vocab.ttl
file:///vocab/qb__vocab.ttl
This reads the content of the ctrlall.ttl
file (a RDF/Turtle file) as well as the RDF vocabularies in the vocab
directory and imports the content into the knowledge graph.
The initial printed URN is the identifier of the content in the ctrlall.ttl
file. You can view the content by following the following link:
or by entering the URN into the search bar in the top right.
References to other files in the repository using file:
URIs are resolved and are also imported. For example the ctrlall.ttl
file contains a reference to file:data/HES/HES.HAMLA.67.ttl
. When importing the ctrlall.ttl
file the fiile data/HES.HES.HAMLA.67.ttl
will also be imported.
Imported data is content-addressed, meaning that the identifiers are computed from the content of the data itself. This allows deterministic identifiers without any centralized coordination. The computed identifier is printed when importing data.
Data File Descriptions
One specific type of RDF content are descriptions of data files. They are stored as plain-text RDF/Turtle files in the data subdirectory of the ~ctrlall~ repository. For example HES.HAMLA.68.ttl
:
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>.
@prefix dcterms: <http://purl.org/dc/terms/> .
@prefix nips: <http://computingthesocial.net/ns/nips/> .
<> a nips:DataFile ;
dcterms:title "HES.HAMLA.68.NIPS" ;
dcterms:format "NIPS" ;
dcterms:publisher "NARA" ;
rdf:value <file:HES.HAMLA.68.NIPS> .
The previous invocation of the ~import~ command imported these statements and gave them the content-addressed identifier: urn:eris:BIANYKNCSDX2ZHV6DG6NYFXFXUQ7PMVUZLYLR26BBFKBNLA65CEJJ6CK3C6PKQKXYZ2H5LEAGRJZXOXSA3GYX4QLMOQHZ3XTDLP5USEXGQ.
It describes the binary file HES.HAMLA.68.NIPS
with some metadata using predicates from Dublin Core.
The rdf:value
predicate is used link the metadata with the actual binary content, which is also assigned a content-addressed identifier when importing.
Conversion to RDF
The tooling is able to convert NIPS Data Files as well as HES Basic Master Files to an Resource Description Framework (RDF) representation. See Conversion of NIPS Data Files to RDF and Conversion of HES Basic Master File to RDF.
Notes
Notes are named using the Denote file-naming scheme as Markdown documents:
DATE--TITLE__KEYWORDS.md
The DATE
serves as an unique identifier for notes.
Technical documents and other assets that are not Markdown files follow the same file-naming scheme.
Web-based user-interface
SPARQL endpoint
Related Projects and Inspiration
- Decentralized Collaborative Knowledge Management using Git (2018): Paper describing a similar setup.
- Trusty URIs: Verifiable, Immutable, and Permanent Digital Artifacts for Linked Data: Motivates why using content-addressed identifiers is a good idea. Technically we use a different (more robust) scheme for content-addressing as defined in RDF/CBOR