Plan for 2024 Q4
See also the Research Questions.
What are the NIPS files?
The primary database of the Hamlet Evaluation System (HES) seems to be the HES Basic Master File.
Then what are the NIPS Data File that are released by NARA?
- What is the content of the NIPS files? Do they match with what is in the Basic Master File?
- How were the NIPS files generated? By whom and when?
System Evolution
A new Basic Master File seems to have been created every time new data (Update Forms) was read.
- What is the Basic Master File that we have access to (Hamlet Evaluation System (HES) Basic Master File 7/1969 - 1/1973)? What was the last time it was updated? Are all previous updates present and visible in the file?
- Can system evolutions be observed in the Basic Master File we have?
- What are the consequences of the various versions? What changed at the data level? What changed in reports generated by the system?
Annotations and Enactment
- Use PREMIS to annotate system evolution.
- It might be necessary to enact certain parts of system operation to understand system evolution. Output of the enactment could be the annotation.
Errors in Data
- Are there any obvious erros in the data?
- How did the manual data review and verification steps that were part of HES System Operations (see section 3 of the HES Operations Manual) prevent any data errors?
- Are there technical measures built into the system to prevent bit-flips or other forms of data corruption?
- Could errors propagate to report generation? What was the effect of the errors?
Enactment
In order to understand the system (especially system evolution and scoring) it seems necessary to be able to enact parts of the system. We will focus on:
- Work towards enacting a system update (Program R7102P: HES Edit A,Program R7103P: HES Edit B and Program R7104P: HES Update)
- As initial (and much simpler) example enact Program R7119P: Gazetteer File to generate a Hamlet Evaluation System (HES) Gazetteer.
Ctrl+All tooling
UX improvements
- Understand how researchers would want to interact/explore the data and if current user interface is appropriate.
- Implement changes to make the tooling more useable.
Infrastructure
Synchronize Knowledge Graph
Currently updating published data on deeparcher requires uploading a large SQLite database. By design we would be capable of synchronizing knowledge graph state over network as everything is content-addressed and using network-optimized encoding. This would just need to be implemented.
RDF/CBOR
The RDF/CBOR serialization we use for content-addressing RDF is not finalized yet and requires some updates. i