NIPS Data File

A NIPS data file is a collection of formatted records containing data readable by NIPS 360 FFS.

The idea to organize data in records comes from the way IBM mainframes structure data (see Data set (IBM)).

The term Data Set is also used to describe a collection of records. We use the term data file for a file that contains a collection of records and data set for a conceptual collection of records that might also be in-memory. In that sense a data file is a data set.

This note describes the layout of a NIPS data file. See also Appendix A of the User's Manual Volume 1 for a good description of the physical file format.

See also Conversion of NIPS Data Files for some notes on the process of converting NIPS files and some quirks encountered.

Block

A NIPS data file is organized in blocks, which is a grouping of multiple records. In some documentation the term physical record is used for block.

The first four characters of the block are called the block count field. The block count field describes the lenght of the block in bytes (including the block count field).

The remainder of the block is the content of the block. In the case of NIPS data files, a sequence of logical records.

Blocks seem to be an optimization for i/o devices. Using blocks allows writes to device (such as magnetic tapes) to be buffered. A buffer is filled with the content of a block and a block is written in one go.

Logical Records

Logical Records are collections of data elements that are together considered distinct and complete. A NIPS Data file is a collection of logical records.

There are different types of logical records (e.g. Classification Record or Data File Control Record) but they all share the same basic structure:

Length Offset Description
4 0 Used for OS control and contains the record length.
1 4 Delete code which indicates that the record should be removed from the
variable 5 Record Key containing data to uniquely identify record. The first character contains the type code (in EBCDIC).
remainder variable The actual data of the record.

File Format Table (FFT)

The File Format Table (FFT) is a structure that describes the format of the records in a data file. It is created by the File Structuring Component (FS) of NIPS 360 FFS and consists of three record types:

Note that some NIPS data files seem to have multiple FFTs for unknown reasons (see Analysis of HES.HES71.72.NIPS).

Classification Record

The classification record (code B) carries a classification label.

In Hamlet Evaluation System (HES) the classification record does not seem to be used and the labels all read XXXXXXXXXXXX.

Data File Control Record

The data file control record (code C) carries information on the format of subsequent element format records and data file records.

Information specified includes (non-exhaustive):

Element Format Record

Elements are user-defined fields of information. The element format records (code F) hold element names, attributes as well as position and length in data file records.

NIPS defines various data value modes that are defined for elements in the element format record. Data value modes include:

See Appendix A.3.3 of the NIPS User's Manual Volume 1 or the PyNIPS documentation for all the information that is stored in an element format record (it's quite a lot).

Data File Record

Data file records (code R) hold the real data in a data file.

Basically data file records are a sequence of fields as defined by the element format records.

The format is identical for the fixed set and periodic sets. Fixed sets just have a different set of elements defined than the periodic sets.

Other Record Types

Othe record types are described in NIPS data files: