NIPS Data File
A NIPS data file is a collection of formatted records containing data readable by NIPS 360 FFS.
The idea to organize data in records comes from the way IBM mainframes structure data (see Data set (IBM)).
The term Data Set is also used to describe a collection of records. We use the term data file for a file that contains a collection of records and data set for a conceptual collection of records that might also be in-memory. In that sense a data file is a data set.
This note describes the layout of a NIPS data file. See also Appendix A of the User's Manual Volume 1 for a good description of the physical file format.
See also Conversion of NIPS Data Files for some notes on the process of converting NIPS files and some quirks encountered.
Block
A NIPS data file is organized in blocks, which is a grouping of multiple records. In some documentation the term physical record is used for block.
The first four characters of the block are called the block count field. The block count field describes the lenght of the block in bytes (including the block count field).
The remainder of the block is the content of the block. In the case of NIPS data files, a sequence of logical records.
Blocks seem to be an optimization for i/o devices. Using blocks allows writes to device (such as magnetic tapes) to be buffered. A buffer is filled with the content of a block and a block is written in one go.
Logical Records
Logical Records are collections of data elements that are together considered distinct and complete. A NIPS Data file is a collection of logical records.
There are different types of logical records (e.g. Classification Record or Data File Control Record) but they all share the same basic structure:
Length | Offset | Description |
---|---|---|
4 | 0 | Used for OS control and contains the record length. |
1 | 4 | Delete code which indicates that the record should be removed from the |
variable | 5 | Record Key containing data to uniquely identify record. The first character contains the type code (in EBCDIC). |
remainder | variable | The actual data of the record. |
File Format Table (FFT)
The File Format Table (FFT) is a structure that describes the format of the records in a data file. It is created by the File Structuring Component (FS) of NIPS 360 FFS and consists of three record types:
- A single Classification Record
- A single Data File Control Record
- Multiple Element Format Records
Note that some NIPS data files seem to have multiple FFTs for unknown reasons (see Analysis of HES.HES71.72.NIPS).
Classification Record
The classification record (code B
) carries a classification label.
In Hamlet Evaluation System (HES) the classification record does not seem to be used and the labels all read XXXXXXXXXXXX
.
Data File Control Record
The data file control record (code C
) carries information on the format of subsequent element format records and data file records.
Information specified includes (non-exhaustive):
- Position and lenght of Record Control Group (the record identifier).
- Position and lenght of set identifier (to which set does the record belong)
- Number of periodic sets
Element Format Record
Elements are user-defined fields of information. The element format records (code F
) hold element names, attributes as well as position and length in data file records.
NIPS defines various data value modes that are defined for elements in the element format record. Data value modes include:
- Alphameric Mode (code
A
) - Numeric Mode (code
B
) - Geographc Coordinate Mode (code
C
) - Decimal Mode (code
D
)
See Appendix A.3.3 of the NIPS User's Manual Volume 1 or the PyNIPS documentation for all the information that is stored in an element format record (it's quite a lot).
Data File Record
Data file records (code R
) hold the real data in a data file.
Basically data file records are a sequence of fields as defined by the element format records.
The format is identical for the fixed set and periodic sets. Fixed sets just have a different set of elements defined than the periodic sets.
Other Record Types
Othe record types are described in NIPS data files:
L
andM
: File Maintenance Logic Records. Appears in HES file such as HES.HAMLA.67.NIPS (NIPSEXT info).N
: Statistical RecordsP
: Segment RecordsQ
: Unknown (NIPSTRAN labels this as "Specified but Unknown Type Records"). Apperas in Vietname Data Base (VNDBA) files.