HES.HES71.72.NIPS

HES.HES71.72.NIPS is a NIPS Data File that is part of the NARA file unit Hamlet Evaluation System 1970 (HES70) Files, 1970 - 1971. It is noteable as it contains some irregularities that are described here.

See also the RDF Data File and the NIPSEXT info output containing layout information.

Multiple FFTs

It seems like there are multiple FFTs in the NIPS data file.

This corresponds to what NARA has documented in the HES70 Technical Documentation:

During the ASCII conversion process, staff found that HES.HES71.72.NIPS contains multiple embedded File Format Tables (FFTs). Staff identified three distinct FFTs, beginning at file positions 0 (pos:0000:0000), 20377432 (pos:0136:EF58), and 20776952 (pos:013D:07F8), respectively.

In order to find all the File Format Tables (all the Data File Control Records) we create a small Python script (also in the PyNIPS repository as examples/find_all_ffts.py):

from pprint import pprint

import nips.logical_record
from nips.fft import DataFileControlRecord


def find_all_ffts(nips_file):

    with open(nips_file, "rb") as file:

        for logical_record in nips.logical_record.read(file):

            if logical_record.type == "C":
                data_file_control_record = DataFileControlRecord(logical_record)
                pprint(data_file_control_record)

The output is:

DataFileControlRecord(offset=264,
                      length=300,
                      os_control=0,
                      delete_code=b'\x00',
                      type='C',
                      record_control_group_position=6,
                      record_control_group_length=9,
                      set_id_position=15,
                      set_id_length=1,
                      subset_control_group_position=16,
                      subset_control_group_length=4,
                      number_of_periodic_sets=4,
                      element_format_record_significant_data_position=20,
                      fixed_set_structure=UserDataFileRecordStructure(length=8,
                                                                      binary_words=1,
                                                                      binary_block_position=27),
                      periodic_set_structure=[UserDataFileRecordStructure(length=36,
                                                                          binary_words=4,
                                                                          binary_block_position=39),
                                              UserDataFileRecordStructure(length=40,
                                                                          binary_words=7,
                                                                          binary_block_position=51),
                                              UserDataFileRecordStructure(length=18,
                                                                          binary_words=2,
                                                                          binary_block_position=31),
                                              UserDataFileRecordStructure(length=18,
                                                                          binary_words=2,
                                                                          binary_block_position=31)])
DataFileControlRecord(offset=20377696,
                      length=300,
                      os_control=0,
                      delete_code=b'\x00',
                      type='C',
                      record_control_group_position=6,
                      record_control_group_length=9,
                      set_id_position=15,
                      set_id_length=1,
                      subset_control_group_position=16,
                      subset_control_group_length=4,
                      number_of_periodic_sets=4,
                      element_format_record_significant_data_position=20,
                      fixed_set_structure=UserDataFileRecordStructure(length=8,
                                                                      binary_words=1,
                                                                      binary_block_position=27),
                      periodic_set_structure=[UserDataFileRecordStructure(length=36,
                                                                          binary_words=4,
                                                                          binary_block_position=39),
                                              UserDataFileRecordStructure(length=40,
                                                                          binary_words=7,
                                                                          binary_block_position=51),
                                              UserDataFileRecordStructure(length=18,
                                                                          binary_words=2,
                                                                          binary_block_position=31),
                                              UserDataFileRecordStructure(length=18,
                                                                          binary_words=2,
                                                                          binary_block_position=31)])
DataFileControlRecord(offset=20777212,
                      length=288,
                      os_control=0,
                      delete_code=b'\x00',
                      type='C',
                      record_control_group_position=6,
                      record_control_group_length=10,
                      set_id_position=16,
                      set_id_length=1,
                      subset_control_group_position=17,
                      subset_control_group_length=4,
                      number_of_periodic_sets=1,
                      element_format_record_significant_data_position=22,
                      fixed_set_structure=UserDataFileRecordStructure(length=18,
                                                                      binary_words=2,
                                                                      binary_block_position=31),
                      periodic_set_structure=[UserDataFileRecordStructure(length=11,
                                                                          binary_words=2,
                                                                          binary_block_position=31)])

This seems to correspond with the findings of NARA. There are three FFTs. Note that the offsets are note the same. The offsets output by the Python script are the start of the Data File Control Records. NARA probably indicates the position of the Classification Record.

Note that the two first FFTs decsribe the same structure. This also corresponds to NARA findings:

Further inspection of the NIPS file found that the first two FFTs (those beginning at file positions 0 and 20377432) describe data elements in the HES71 documentation; however, the third (file position 20776952) does not. While the third FFT appears to contain a usable layout for the records, due to the lack of adequate documentation, staff are unsure as to what kind of data the records associated with the third FFT contain.

Invalid Set IDs

224 data file records (starting at offset 20788704; seem to be at the ed of the file) have invalid set id's. Set ID's as parsed are between 240 and 249.

In PyNIPS these are categorized as "orphan" data file records.

NIPSTRAN encounters errors while parsing the file (see HES.HES71.72.NIPS.REPORT.ASCII.txt):

**Warning** Periodic Table 4, record 15451 is incorrect length (length with line termination is 164)

TODO: check if this NIPSTRAN warning is related to the invalid set ids.

A bug in PyNIPS

We note that the record_control_group_length is 9. When debugging we discover that PyNIPS was not correctly reading the element names. It was reading from record_control_group_position to record_control_group_length instead of from record_control_group_position to record_control_group_position + record_control_group_length.

The bug was fixed with a7e44503ec.