Conversion of NIPS Data Files to RDF

The Ctrl + All Tooling is capable of converting NIPS Data Files to an Resource Description Framework (RDF) representation. This note describes the conversion as well as some quirks encountered in NIPS data files.

Overview

NIPS Data files are described in RDF using the RDF Data Cube Vocabulary (QB) and DDI-RDF Discover Vocabulary (Disco) vocabulary. A dedicated vocbulary - NIPSV - is defined for specialized classes and properties.

Data in NIPS data files are organized into sets. One Fixed Set and multiple Periodic Sets. Every NIPS set is mapped to a QB data set by the conversion. NIPS logical records correspond to QB observations.

For example the converion of the NIPS Data File HES.HAMLA.68.NIPS results in 2 data sets for every set:

  1. Fixed Set
  2. Periodic Set 1

Additionally a special QB data set is created for all logical records. This is used for holding records that are not Data File Records and are not members of the Fixed Set or a Periodic Set.

Implementation

The conversion is implemented in two steps:

  1. Parsing of the NIPS Data File to Python values using the PyNIPS library
  2. Serialization of the Pythonic representation defined by PyNIPS to Resource Description Framework (RDF). This is implemented in the module ctrlall.cmds.nips2rdf script.

Elements

Elements in NIPS Data Files are user-defined fields of information. The lenght, data type and other attirbutes of NIPS elements are defined by Element Format Records that are themeselves records in the NIPS data file (see NIPS Data File).

In the QB data sets for the fixed set and peridoc sets of the NIPS data file, elements correspond to the components of the observations. As they are defined dynamically by the NIPS file they are not fixed RDF predicates defined by some vocabulary. Instead the components are Element Format Records - the records that define the elements in the NIPS data file. For example the POPUL element corresponds to an Element Format Record that is also used as a component/measure property for observations in the fixed set.

Binary representation of records

At the logical record level, the translation to RDF is lossless as the binary representation of the logical record is also included in the conversion using the rdf:value predicate.

Information not preserved by conversion

Blocks

NIPS data files are organized into blocks containing multiple logical records. This is done to optimize tape usage and read times (see Section 8.8 Blocked Records of Programming the ibm 360). The choice of block size might provide some insight into core memory available to the processing system.

Applications such as NIPS 360 FFS File Structuring are capable of reading blocked or unblocked file records (see Section 2.5.1.1 File Organization of Users Manual Volume 3: File Maintenance fm).

NIPSTRAN does not preserve blocking information when converting to ASCII records.

The conversion indirectly preserves this information as it tracks the offset and lenght of a logical record. This can be used to figure out where blocks.

Quirks encountered in NIPS data files

ASCII encoded record length

Some NIPS data files (e.g. from Pacification Attitude Analysis System (PAAS) Data File, March 1970 - December 1972) have the record length encoded in the first 4 bytes as ASCII. There are also no block headers present instead the block lenght is fixed to 1004 bytes.

PyNIPS can handle this (see 80be4f0f75. This was implemented by figuring out what NIPSTRAN does. We have not been able to find documentation about this encoding yet.

Duplicate FFT

The NIPS file HES.HES71.72.NIPS has multiple File Format Tables. See HES.HES71.72.NIPS for more information.

Invalid Set IDs

It seems as if the encoded set if of some records in HES.HES71.72.NIPS are invalid. See HES.HES71.72.NIPS for more information.