The data input to CODAP normally comes from keypunched or optically scanned survey booklets as well as records merged from personnel files. Although this data may start out in very diverse formats, CODAP requires this data to be merged and formatted as card image data files. This means that data for a single respondent must be broken up into multiple records of no more than 80 (or 84) characters each, which means additional control information is necessary to identify and sequence all the records associated with a given respondent.
Because of the large amount of data which may be contained in a single case, as many as 200 card images might be required to hold all the information collected in each survey booklet. In order for you to describe your reformatted raw data file to CODAP, you must create a set of CODAP FORMAT cards for input to the AUDITR and INPSTD programs. The AUDITR program checks the each case data card image against its corresponding FORMAT card and produces a valid output file. This valid output file from AUDITR MUST be input to INPSTD to convert the card image records into usable CODAP records on the Case Data file.
The traditional usage of columns for CODAP data and FORMAT cards is:
Data Columns Usage.......
01 - 04 Booklet Control ID
05 - 73 Background, Task, or "check" characters
74 - 75 Card ID
76 - 79 Study number
80 Factor/Data Type Code
Except for the Booklet Control IDs, which are required to be in columns 1 through 4, the usage of these columns is only a recommendation and not an actual requirement. Each of these column definitions will be described below in order to explain the source of the data found in those columns. The following section will focus more on the mechanics of specification including check & transparent characters.
Columns 01 - 04
The first four columns on each FORMAT card MUST contain 'B...' to represent the actual Booklet Control ID found in the first four columns of each data record. As CODAP will accept up to 20,000 cases for input, the Booklet Control ID may include alphas to code unique IDs above 9999. This is usually accomplished by reformatting a 5-digit number in the booklet and converting the left two digits into a single character such as '00='0', '10'='A', '11'='B', '12'='C', etc. CODAP requires the Booklet Control ID to be declared as the first background variable. Traditionally this variable has been called the "Case Control Number", but because of the alphas, it is not, strictly speaking, a "number". Note that this 'B...' counts as only one background variable even though it appears multiple times on the front of each FORMAT card. For every other FORMAT card code (except '.'), multiple occurrences mean multiple variables.
Columns 05-73
The background (or "history") variables can be any information merged in from other sources or collected in the survey booklet (excluding task responses). This can include age, race, marital status, job related responses like tools used etc., and/or training, courses, or schooling completed. This information will ultimately be stored in character format on the Case Data File. Background variables are assigned names V0001 to Vxxxx where xxxx is the number of the last background item. The Booklet Control ID must be V0001. The starting position and order of your data fields in your reformatted raw data file dictates the position of your 'V' or 'Z' FORMAT codes and the order of the variable definitions. These variable definition/name assignments are made using VARIABLE-IDS cards explained in the background variable titles section.
Task information can be collected for any factor including those listed below. An AUDITR - INPSTD sequence must be run for each unique factor being processed (e.g., time spent, training emphasis, primary or secondary factors). These data will ultimately be stored as partial word integers on the Case Data File. Although task responses are typically single digit numbers on input they can be multiple column responses containing up to six digits each. It is required however that all task responses be the same number of digits in length, e.g. if all task responses are 3 digits long, a response of 1 would be represented as '001' and the space on the format card would be encoded as 'T..' according to the description below. Secondary factors are not distinguished from primary factors other than in the use of " T" instead of "T " on the format cards. As with background variables, the starting position and order of your data fields in your reformatted raw data file dictates the position of your 'T' FORMAT code and the order of the Task-Duty Titles. These task statement definitions are made using TASK-DUTY/TITLES cards explained in the task/duty titles section.
Although tasks may be described using statements which include identifiers like "A 17" or "C 2" (Duty Letter & Task Number within Duty), CODAP internally identifies all tasks sequentially from one to number of tasks, regardless of duty. In most specifications to CODAP programs, tasks MUST be referenced by sequential task number, NOT by 'Duty-Task' identifier. Refer to a TASKXX output generated by INPSTD to see sequential identifiers regardless of how the statements are labeled. Also, in these references, "T0001" would mean the "percentage of time on task 1" while "R0001" would mean the "original raw rating on task 1".
Columns 74 - 75
To insure proper sequencing of the records in each case, the FORMAT cards should contain the card ID in columns 74-75. Card IDs may include alpha characters to identify more than 99 cards per case.
Columns 76 - 79
Columns 76-79 may be used to encode the source study number. This is especially important if you maintain a Master File of all raw data cards.
Column 80
Column 80 should include a Factor/Data type code for the survey according to the following convention:
Time Spent Data 1 Task Difficulty Data 2 Consequences of Inadequate Performance 3 Task Delay Tolerance (Immediacy) 4 Training Emphasis (Field Recommended) 5 <Reserved for Task Transpose Data> 6
As new data types or factors (Absolute Frequency, Level of Involvement) begin to enter these standard formats, an effort will be made to identify new codes for inclusion in this table.
FORMAT Card Code Specifications
Because space is limited an entire deck of format cards cannot be shown but the example below is supplied to show the proper construction of the format cards. The first two lines are only scale lines to show the card column numbers for the information being represented.
CC: 0000 77 8
1234 . . . 45 0
B...V...........V...VVVVVVVV . . . V.. V...01ssss1
B...V.V.V.V.VVVVV.V.VVVVVVVV . . . VVVVVVVV02ssss1
B...TTTTTTTTTTTTTTTTTTTTTTTT . . . TTTTTTTT03ssss1
B...TTTTTTTTT . . . 04ssss1
Background responses characters
These characters represent the location of a background variable and may be denoted by a 'V' followed by as many periods as are necessary to represent the proper size field in the raw data. A single 'V' represents a one character field. These responses are always treated as character data and are not range checked in any of the CODAP programs. Additionally, a single 'Z' may be used for one character dichotomous fields. Any response other than 0 or 1 will be recoded to a '+'. Blanks are considered as zeros only in a 'Z' field. There is a maximum of 2000 words available for background variables.
Task response characters
These characters represent the location of a task variable and are denoted as a 'T' followed by the proper number of periods, if any. Task responses are range checked by the programs and must all be the same number of digits in length on the FORMAT cards. An error will result if some tasks are represented as 'T..' and some are represented as 'T'. The maximum number of tasks allowed is 3000.
Check characters
These characters are any characters which are found on the FORMAT cards other than 'T', 'V', 'Z', '.' or ' '. Any check characters found will be matched against the character in the corresponding location of each case. If they do not match, the case will be dropped as a bad case. This may be used in AUDITR to pull subsamples off from a master raw data file by specifying the required exact-match value in the appropriate column(s). Note: check characters are not considered to be variables.
Transparent Characters
Any space characters found on the format cards will be treated as transparent characters. This means that any characters in the corresponding locations in the raw data will be ignored regardless of the value. These characters will not end up in the CODAP system.