Conversion of HES Basic Master File to RDF
The Ctrl + All Tooling is capable of converting HES Basic Master Files to an Resource Description Framework (RDF) representation. This note describes the conversion.
Overview
HES Basic Master Files are described in RDF using the RDF Data Cube Vocabulary (QB). A dedicated vocabulary - HES-BMF
- is used to describe the HES Basic Master File and the various fields of the many record types.
HES Basic Master Files consist of 22 different record types. For each record type the conversion defines a QB Data Set. Additionally a QB Data Set is defined for all records in the Basic Master File. See for example the data set of all records contained in the HES Basic Master File HES.Y6907.Y7302.
Implementation
The conversion is implemented in two steps (similar to the Conversion of NIPS Data Files to RDF):
- Parsing of HES Basic Master File Records to Python values. This is implemented in the module
hes.basic_master_file
. - Serialization of the Python representation to RDF. This is implemented in the module
ctrlall.cmds.bmf2rdf
.
Open Issues/Questions
Question Response Record
The question response records (Hamlet Monthly Hamlet Quarterly, Village Monthly and Village Quarterly) contains encoded question responses. For example:
A44B00000009C0002D0000000Z999999999
How can this be decoded and mapped to questions from the QTAB Questions?
QTAB Question Description Records - VARIABLE QUESTION NOT USED THIS CYCLE.
Some QTAB Question Description Records hold a sigle question text group (the structure that stores question and response text):
71017105VARIABLE QUESTION NOT USED THIS CYCLE.
According to the documentation (Basic Master File Description ) this does not seem valid. Currently our code parses this incorrectly as the 71st response for no question:
QTABQuestionDescriptionRecord(length=87,
os_control=0,
usid=' ',
record_code='3029',
record_start_date='7101',
record_stop_date='7301',
activity_code='0',
question_code='HMZ03',
level_code='HM',
topic_group_code='Z',
question_number=3,
maximum_response=0,
question_text_group_count=1,
question_text='',
responses={71: '017105VARIABLE QUESTION NOT USED '
'THIS CYCLE.'})
Questions:
- Are there question responses that use these questions?
- How should such unused questions be described in RDF?
- Are there HES programs that understand this format? We might be able to find indications in the respective program manuals.