Conversion of NIPS Data Files to RDF
The Ctrl + All Tooling is capable of converting NIPS Data Files to an Resource Description Framework (RDF) representation. This note describes the conversion as well as some quirks encountered in NIPS data files.
Overview
NIPS Data files are described in RDF using the RDF Data Cube Vocabulary (QB) and DDI-RDF Discover Vocabulary (Disco) vocabulary. A dedicated vocbulary - NIPSV
- is defined for specialized classes and properties.
Data in NIPS data files are organized into sets. One Fixed Set and multiple Periodic Sets. Every NIPS set is mapped to a QB data set by the conversion. NIPS logical records correspond to QB observations.
For example the converion of the NIPS Data File HES.HAMLA.68.NIPS results in 2 data sets for every set:
Additionally a special QB data set is created for all logical records. This is used for holding records that are not Data File Records and are not members of the Fixed Set or a Periodic Set.
Implementation
The conversion is implemented in two steps:
- Parsing of the NIPS Data File to Python values using the PyNIPS library
- Serialization of the Pythonic representation defined by PyNIPS to Resource Description Framework (RDF). This is implemented in the module ctrlall.cmds.nips2rdf script.
Elements
Elements in NIPS Data Files are user-defined fields of information. The lenght, data type and other attirbutes of NIPS elements are defined by Element Format Records
that are themeselves records in the NIPS data file (see NIPS Data File).
In the QB data sets for the fixed set and peridoc sets of the NIPS data file, elements correspond to the components of the observations. As they are defined dynamically by the NIPS file they are not fixed RDF predicates defined by some vocabulary. Instead the components are Element Format Records
- the records that define the elements in the NIPS data file. For example the POPUL
element corresponds to an Element Format Record that is also used as a component/measure property for observations in the fixed set.
Binary representation of records
At the logical record level, the translation to RDF is lossless as the binary representation of the logical record is also included in the conversion using the rdf:value
predicate.
Information not preserved by conversion
Blocks
NIPS data files are organized into blocks containing multiple logical records. This is done to optimize tape usage and read times (see Section 8.8 Blocked Records of Programming the ibm 360). The choice of block size might provide some insight into core memory available to the processing system.
Applications such as NIPS 360 FFS File Structuring are capable of reading blocked or unblocked file records (see Section 2.5.1.1 File Organization of Users Manual Volume 3: File Maintenance fm).
NIPSTRAN does not preserve blocking information when converting to ASCII records.
The conversion indirectly preserves this information as it tracks the offset and lenght of a logical record. This can be used to figure out where blocks.
Quirks encountered in NIPS data files
ASCII encoded record length
Some NIPS data files (e.g. from Pacification Attitude Analysis System (PAAS) Data File, March 1970 - December 1972) have the record length encoded in the first 4 bytes as ASCII. There are also no block headers present instead the block lenght is fixed to 1004 bytes.
PyNIPS can handle this (see 80be4f0f75. This was implemented by figuring out what NIPSTRAN does. We have not been able to find documentation about this encoding yet.
Duplicate FFT
The NIPS file HES.HES71.72.NIPS has multiple File Format Tables. See HES.HES71.72.NIPS for more information.
Invalid Set IDs
It seems as if the encoded set if of some records in HES.HES71.72.NIPS are invalid. See HES.HES71.72.NIPS for more information.