Last updated 19 Nov 97
VARGEN is a FORTRAN preprocessor that generates programs capable of creating new computed variables. The new variables can be computed by interacting computed, background, or task variables in a mathematical equation to produce a new value for each case. A capability also exists to interact task responses with a task factor to compute an aggregate value for each case.
Any mathematical computation supported by FORTRAN may be used to interact the variables of interest. Functions and subroutines are available to interact task factors with tasks performed by each case. Details concerning the use of these capabilities, and the form of the computations in the LOGIC control cards section follow in later sections of this document.
Automatic Process Generation
If a DFILE specifing an output Variable Title file is present, a MAKDFL is run on the output Case Data file. If the P option is given, a DICTXX will be executed after the MAKDFL.

Program Invocation
The specific syntax for program invocation varies between operating systems. This document diplays the code for the AIX version.
opt: L List generated source code and compile
N NOGO option, perform control card check
P Print a DICTXX listing
ssss: The study number
CDf: Input Case Data file
TFf: Input Task Factor file
TT: Input Task Title file
VTf: Input Variable Title file
CDf2: Output Case Data file
CSf: Input Cluster Solution file
VARGEN Control Card
VARGEN: cc 01-06 The literal 'VARGEN' study: cc 08-11 The study number title: cc 13-72 The title for the output report
DFILE Specification (optional)
If a DFILE card is specified, a MAKDFL will be run using the output Case Data file as input, and the file specified as the output Variable Title file. The method used to specify an output Variable Title file is dependent upon the operating system used. Both methods are described below.
DFILE Card (VM, AIX, and Unisys)
DFILE: cc 01-05 The literal 'DFILE' VTf: cc 07-46 The output Variable Title file
DFILE Invocation Line Parameter (MVS)
The specification of an output Variable Title file is slightly different on the MVS system. The DFILE is specified as an optional parameter on the EXEC (execute) statement.
Variable Definitions
The variable definitions consist of five sets of control cards which specify the case data variables, presentation sequences, membership masks and task factors which will be used in the LOGIC control cards section. The first four sections, VARIABLES, CLUSTERS, MEMBERS and FACTORS, are only required if those specific variables are used in the LOGIC control cards section. The fifth section, the OUTPUT control cards section, is always required. If no output variables are defined, the program generated by VARGEN will copy the Case Data file as is to the output file. This is useful for copying tape files to mass storage.
These five sets of control cards must be in the sequence given below, if used. If any of the first four sets of control cards is unnecessary, none of the cards for that set should be used (e.g. if no task factors are input, do not include the FACTORS card or the corresponding @EOF card).
Case Data Variable Definition Set
This set of control cards will assign names to the variables selected. Each case data variable used in computations in the LOGIC control cards section must be named in this way. If no case data variables are used, the control cards in this set should not be included.
VARIABLES Control Card
VAR: cc 01-03 The literal 'VAR'
Case Data Variable Definition Cards
name: cc 01-06 The name to use in the LOGIC section to refer to the variable specified. The first character must be alphabetic. =: cc 07 The literal '=' var: cc 08-12 The variable ID from the Case Data file. This can be any computed (Cxxxx), task (Txxxx), or background (Vxxxx) variable. type: cc 13-13 An 'A' defines this variable as alpha, otherwise numeric (background variables only).
These cards are terminated by an '@eof' image.
Presentation Sequence Definition Set
These control cards assign names to the presentation sequences selected. These names are then used in the LOGIC section to refer to the corresponding KPATH number for that presentation sequence for each case. These variables may be used only with the KPATH function. If KPATH number is not used, the control cards in this set should not be included.
CLUSTERS Control Card
This card is the beginning of a set of presentation sequence definition cards.
CLU: cc 01-03 The literal 'CLU'
Presentation Sequence Definition Cards
This set of control cards should contain one card for each presentation sequence used in the LOGIC control cards section.
name: cc 01-06 The name to use in the LOGIC section to refer to the presentation sequence specified. This name may be from one to six characters. The first character must be alphabetic. =: cc 07 The literal '=' pres: cc 08-13 The presentation sequence (PSxxxx)
These cards are terminated by an '@eof' image.
Membership Definition Set
This set of control cards will assign names to the membership masks selected. These names are then used in the LOGIC section to refer to the corresponding mask. The names in this section can only be used as parameters of the MEMBER function. If the MEMBER function is not used, this set of control cards should not be used. There is a maximum of 25 membership masks and/or presentation sequences that can be defined in a VARGEN runstream.
MEMBERS Control Card
This card is the beginning of a set of membership mask definition cards.
MEM: cc 01-03 The literal 'MEM'
Membership Mask Definition Cards
This set of control cards should contain one card for each membership mask used in the LOGIC control cards section.
name: cc 01-06 The name used in the LOGIC control cards cards section to refer to this membership mask. The first character must be alphabetic. =: cc 07 The literal '=' var: cc 08-13 The membership mask to be used in the LOGIC control cards section.
These cards are terminated by an '@eof' image.
Task Factor Definition Set
This set of control cards will assign names to the task factors selected. These names are then used in the LOGIC control cards section to refer to the corresponding factor. These names should not be used in computations because they are not simple numeric variables. The names defined in this section should only be used as parameters of the function calls described in the LOGIC control cards section. Each task factor used in the LOGIC control cards section must be named in this way. If no VARGEN functions are used, the control cards in this set should not be included.
FACTORS Control Card
This card informs VARGEN that this is the beginning of a set of task factor definition cards.
FAC: cc 01-03 The literal 'FAC'
Task Factor Definition Cards
This set of control cards should contain one card for each task factor used in the LOGIC control cards section.
name: cc 01-06 The name used in the LOGIC control cards to refer to this task factor. This name can be 1 to 6 characters in length. The first character must be alphabetic. =: cc 07 The literal '=' fac: cc 08-13 The name of the task factor. /: cc 14 The literal '/' type: cc 15-17 The task factor type. This can be any type except 'MBR'.
These cards are terminated by an '@eof' image.
Weight Substitution Value Definition Cards
This set of control cards is used to assign new weight substitution values for each of the points on the rating scale.
SUBVAL Control Card
This card informs VARGEN that this is the beginning of the set of weight substitution value definition cards.
SUB: cc 01-03 The literal 'SUB'
Weight Substitution Value Definition Set
This set of control cards should contain one card for each point on the rating scale. The rating scale must consist of less than 100 unique scale points.
value: cc 01-20 This is a rather free-format field in that
the user may enter an integer, a decimal
number or a FORTRAN mathematical equation.
These cards are terminated by an '@eof' image.
Output Computed Variable Definition Cards
This set of control cards is used to name the new computed variables. These names are then used in the LOGIC control cards section to refer to the corresponding variable.
OUTPUT Control Card
This card informs VARGEN that this is the beginning of the set of output computed variable definition cards.
OUT: cc 01-03 The literal 'OUT'
Output Computed Variable Definition Set
This set of control cards should contain one card for each output variable created in the LOGIC control cards section.
name: cc 01-06 The name which will be used in the LOGIC control cards section to refer to the variable specified. This name can be 1 to 6 characters in length. The first character must be alphabetic. =: cc 07 The literal '=' title: cc 08-80 The title for the new computed variable. This title will be added to the variable dictionary on the output file.
These cards are terminated by an '@eof' image.
User Computation and Logic Specifications
The information provided in this section is added to the generated program so that the instructions will be performed once for each case on the input file. This section may contain instructions for computing values as well as conditional expressions for controlling the performance of computations. In addition to the operators and mathematical functions available for use through FORTRAN, a series of functions is available for use in VARGEN which interact specified task factors with the task response information for each case. Each one of these functions returns a value based on the result of the interaction. Each of the functions available are described in the VARGEN Functions section below.
LOGIC Control Card
This card begins the user logic.
LOG: cc 01-03 The literal 'LOG'
User Logic Specification Cards
The cards in this section determine the values assigned to the new computed variables. All input and output variables must be defined as described in the previous sections. All names assigned to the variables in the previous sections are floating point decimal values. Any variables created with the cards described in this section (not defined in a previous section) will be treated as integers if the first character of the variable name is one of the letters I through N. Any variable name which begins with a letter other than I through N will be treated as a floating point decimal number. Any variables not defined in a previous section but used in the logic section will be treated as temporary scratch variables. The values contained by these variables will not be recorded on any output data files. In addition, no attempt will be made by VARGEN to initialize any variables created in this manner. This responsibility has been left to the user. The cards described below may be used in any combination desired. The cards will be executed once for each case on the input file.
A special feature exists for statements which are too long to fit on a card. The statement may be continued on subsequent cards by adding a '#' character as the last non-blank character of the card.
The '#' character is called a continuation sentinel and must occur before column 67 on the card containing the unfinished statement. This will force VARGEN to append the statement on the next card to the unfinished statement. As statement may be contained on as many as 25 cards using the continuation sentinel. The continuation sentinel is not recognized on comment cards.
Another useful feature of VARGEN is the availability of a variable named BADVAL. This is a decimal constant set to the standard invalid value used throughout the CODAP system. This value can be used in conditional expressions to prevent erroneous results from occurring when one of the values used in a computation has an invalid value.
The general format for the cards in this section is described below. Spaces are ignored by VARGEN, and it is recommended that spaces be used to improve the readability of the cards. All information on the cards in this set must be contained in card columns 1-66. Any information outside of these columns will be ignored. The first three sections following the general card format contain detailed information about the specific types of statements permitted.
statement: cc 01-66 Any expression or statement which
conforms to the specifications given in
the next three sections.
#: The # character can be used to force
the next statement to be regarded as a
continuation of the current card
Computation Cards
Computation cards contain mathematical expressions which assign a value to a variable. Any mathematical operation or function available to the FORTRAN compiler may be used on these cards. The general form for computation cards is as follows.
The information on this card may be contained anywhere on the card within the first 66 columns as long as it conforms to the form described above.
Condition Specification Cards
The condition specification cards are used to cause or prevent the performance of a statement or a group of statements. IF-THEN-ELSE structures are the most common type of condition specifications. Computed IF statements, GOTO statements, and statement labels are prohibited.
Comment Cards
Any information which helps to explain what the desired results are for the user logic cards may be added to the runstream using this form. If an asterisk is found in card column one of the card, the card is a comment. It is recommended that comment cards be used as needed to clarify what the desired vector should contain if all goes as planned.'
[*] <statement>
*: cc 01 The literal '*' statement: cc 02-66 Any information desired
The end of all control cards is signified by an '@eof' image or an end-of-file condition.
VARGEN Function References
Several functions are available in VARGEN which interact the task responses for each case with a task factor which has been defined in the Task Factor Definitions above. Each function has a single parameter field which is used to specify the working storage name of the desired task factor.
variable: The name of the variable which will receive the
result computed by the function for each case.
function: The name of the function to use. This can be any one
of the functions described below.
name: The working storage name of the task factor to be
used in the computations performed by the function.
The following series of descriptions outline the functions currently available for use in VARGEN.
Calling this function causes the variable specified to be set to a value which is the sum of the absolute differences between the task factor values and the corresponding raw task ratings for all tasks rated. The following equation is used to compute this value for each case.
sum from
value = i=1 to [ ¦ FACTOR(i) - RAW(i) ¦ ]
NTASK
Note that a check is made before each subtraction to insure that the task is rated and the corresponding task factor value is valid. If either of these conditions is not met then the subtraction is not performed and nothing is added to the sum for that task.
Calling this function causes the variable specified to be set to a value which is the average of the task factor values for all tasks performed by a case. The following equation is used to compute this value for each case.
value = SUMFAC ( FACTOR ) / NSET
Notice that this equation yields a result which is equal to the result of using the SUMFAC function with the desired task factor and dividing it by NSET. NSET is the number of tasks performed by a case with a corresponding task factor value that is valid.
Calling this function causes the variable specified to be set to a value which is the average task measure per unit time spent. This is computed by interacting the task factor specified by name (a task factor containing the task measure, most commonly task learning difficulty) with the tasks performed by each case to compute the value. The following equation is used to compute this value for each case.
[ sum from ]
value = [ i=1 to [ FACTOR(i) * TIME(i) ] ] / 100.0
[ NTASK ]
Note that a check is made before each multiplication in the equation above to insure that the task is performed and the corresponding task factor value is valid. If either of these conditions is not met then the multiplication is not performed and nothing is added to the sum for that task.
Calling this function causes the variable specified to be set to a value which is the correlation between the raw task responses and the specified task factor. The following series of equations is used to compute this value for each case.
PART1 = NSET * SD(f) * SD(r)
sum from
PART2 = i=1 to FACTOR(i) * RAW(i)
NTASK
value = [ PART2 - (NSET * MEAN(f) * MEAN(r)) ] / PART1
SD(f) is the standard deviation for the task factor, SD(r) is the standard deviation for the raw responses, MEAN(f) is the mean for the task factor, MEAN(r) is the mean for the raw responses, and NSET is the number of tasks rated which have a corresponding valid task factor value.
Note that a check is made before each computation to insure that the task is performed and the corresponding task factor value is valid. If either of these conditions is not met then the task offers no contribution to the resulting correlation and is ignored.
Calling this function causes the variable specified to be set to the KPATH number for the particular case from the presentation sequence specified. The function argument must be defined in the CLUSTERS control card section.
This function causes the variable specified to be set to a zero or one, depending upon whether the case is a member of the specified group (1=member, 0=non-member). The function argument must be defined in the MEMBERS control card section.
Calling this function causes the variable specified to be set to a value which is the overlap between the tasks performed and the specified task factor. The following equation is used to compute this value for each case.
[ sum from ]
value = [ i=1 to [ MINIMUM (FACTOR(i),TIME(i)) ] ]
[ NTASK ]
Note that a check is made before each computation of a minimum value to insure that the task is performed and the corresponding task factor value is valid. If either of these conditions is not met then the computation is not performed and nothing is added to the sum for that task.
Calling this function causes the variable specified to be set to a value which is the percentage of valid tasks from the task factor performed by each case.
Calling this function causes the variable specified to be set to a value which is the sum of the squared differences between the task factor values and the corresponding raw task ratings for all tasks rated. The following equation is used to compute this value for each case.
sum from value = i=1 to [ ( FACTOR(i) - RAW(i) ) ** 2.0 ] NTASK
Note that a check is made before each subtraction to insure that the task is rated and the corresponding task factor value is valid. If either of these conditions is not met then the subtraction is not performed and nothing is added to the sum for that task.
Calling this function causes the variable specified to be set to a value which is the sum of the task factor values for all tasks performed by a case. The following equation is used to compute this value for each case.
sum from value = i=1 to [ FACTOR(i) * ( TIME(i) / TIME(i) ) ] NTASK
Note that a check is made before each multiplication to insure that the task is performed and the corresponding task factor value is valid. If either of these conditions is not met then the multiplication is not performed and nothing is added to the sum for that task.
Calling this function causes the variable specified to be set to a value which is the sum of the product of the task factor values and the raw response values for all tasks rated by a case. The following equation is used to compute this value for each case.
sum from value = i=1 to [ FACTOR(i) * RAW(i) ] NTASK
Note that a check is made before each multiplication to insure that the task is rated and the corresponding task factor value is valid. If either of these conditions is not met then the multiplication is not performed and nothing is added to the sum for that task.
Calling this function causes the variable specified to be set to a value which is the sum of the product of the task factor values and the percent time spent values for all tasks performed by a case. The following equation is used to compute this value for each case.
sum from
value = i=1 to [ FACTOR(i) * TIME(i) ]
NTASK
Note that a check is made before each multiplication to insure that the task is performed and the corresponding task factor value is valid. If either of these conditions is not met then the multiplication is not performed and nothing is added to the sum for that task.
This function sets the variable specified to the average of the cross product of the task factor and the corresponding raw rating values for all tasks rated by a case. The following equation is used to compute this value for each case.
[ sum from ] value = [ i=1 to [ ( FACTOR(i) * RAW(i) ) ] / NSET [ NTASK ]
Note that a check is made before each multiplication to insure that the task is rated and the corresponding task factor value is valid. If either of these conditions is not met then the multiplication is not performed and nothing is added to the sum for that task. After the sum has been computed it is divided by NSET. NSET is the number of tasks rated by a case with a corresponding task factor value that is valid.
This function sets the variable to the number of non-zero task values remaining for a case, after its tasks have been multiplied by the specified task factor. The following equation is used to compute this value for each case.
For i = 1 to Number of Tasks in the Inventory:New Task Value[i] = FACTOR(i)*Old Task Value(i) Output Value = Add one if New Task Value is Greater than Zero
Note that a check is made before each multiplication to insure that the task is performed and the corresponding task factor value is valid. If either of these conditions is not met then the multiplication is not performed and nothing is added to number of tasks performed for that case.
NOTE: The TASK VALUES ARE CHANGED, with only non-zero being written out. When used with a (1,0) (1=keep,0=drop) task factor, selected subsets of task may be used for analysis (say Management Tasks only, First Enlistment Tasks, etc; See TSKFAC). Time, however is NOT re-accumulated to 100% unless the REPTS subroutine (described below) is used AFTER this command. Example:
OUT001 = ZAP(FAC001)CALL REPTS
Several program actions are available in VARGEN which adjust the task responses within each case. These actions are invoked as FORTRAN Subroutines ("CALL <name>") as opposed to the functions described above. Subroutines may or may not use a parameter field which is used to pass information to or from the subroutine.
The following series of descriptions outline the subroutines currently available for use in VARGEN.
This subroutine will recompute the task values for every case on the case data file. The tasks are converted back to the original raw responses then divided by the specified divisor(div). This divisor will replace the computed variable C0008.
Calling this subroutine causes the percent time spent for all remaining task responses to be recomputed. This function is usually used after the ZAP function. The following equation is used to recompute the task responses.
j = 1 to NTASK [ sum from ]
Task(j) = Task(j) / [ i=1 to [ Task(i) ] ] * 100.0
[ NTASK ]
@codap vargen - cd100 tf100 tt001 vt100 cd101 cs100 VARGEN ssss Title for VARGEN, creating new variables DFILE vt101 VARIABLES VAR01 =T0123 VAR02 =V0019 @eof CLUSTERS PRES01=PS0002 @eof MEMBERS GP01 =ST0056 GP02 =GP0002 @eof FACTORS FAC01 =TF0001/RMN FAC02 =TF0005/DEC @eof OUTPUT OUT1 =Product of T0123 times V0019 OUT2 =Overlap of tasks performed with TF0001/RMN OUT3 =Average task learning difficulty OUT4 =KPATH number for presentation sequence PS0002 OUT5 =Member of GP0002 or ST0056 (1=yes, 0=no) @eof LOGIC OUT1=VAR01*VAR02 OUT2=OVRLAP(FAC01) OUT3=AVGPUT(FAC02) OUT4=KPATH(PRES01) TEMP1=MEMBER(GP01) OUT5=MEMBER(GP02) IF (TEMP1.EQ.1.0) OUT5=1.0 @eof
Five computed variables will be added to output file cd101. A MAKDFL will be run, creating the new DFILE vt101.