OVRLAP: Compute Similarity Between Cases

Last updated 19 Nov 97

OVRLAP computes a similarity measure between every pair of cases contained on a Case Data file. The similarity measure is computed for each pair of cases by comparing the corresponding responses for all tasks. This means that a small increase in the number of tasks in an inventory will cause a dramatic increase in the total number of comparisons which must be made. In fact, the number of comparisons made in almost any CODAP clustering is extremely large. The magnitude of the number of comparisons performed for an OVRLAP run place a heavy load on computer system resources. For this reason OVRLAP should only be run when necessary, and always in strict compliance with CODAP standards.

OVRLAP will cluster a maximum of 7000 cases or tasks. When run it will produce a file containing a similarity matrix which the GROUP program can read. For any further information concerning the techniques or theories involved in hierarchical clustering refer to the following technical report:

Computation of Group Job Descriptions From Occupational Survey Data, Wayne B. Archer, December 1966, PRL-TR-66-12

Program Invocation

The specific syntax for program invocation varies between operating systems. This document diplays the code for the AIX version.


@codap ovrlap <opt> <ssss> <CDf> <OMf>

opt:   P   Use proportional values in the overlap equation
       A   Adjust values used for computing proportional values
ssss:      The study number
CDf:       Input Case Data file
OMf:       Output OVRLAP Matrix file.

OVRLAP Control Card


OVRLAP <study> <measure>

OVRLAP:  cc 01-06   The literal 'OVRLAP'
study:   cc 08-11   The study number for this run
measure: cc 13      The similarity measure to be used.  This value defines
		    which overlap equation will be used.  Valid values are
		    A, B, C, D, 0 - 9, or ' ' (default).
		    ' '    OVij  =  MIN ( I(itask), J(itask))
		     A     OVij  =  100 * Pij / (Ni + Nj - Nij)
		     B     OVij  =  100 * ( 1 - (Ni + Nj -Nij) / ntask)
		     C     OVij  =  200 *  Nij / (Ni + Nj)
		     D     OVij  =  100 * Nij / SQRT(Ni*Nj)
		     k     OVij  =  50 * ( Nij / (Ni+k) + Nij / (Nj+k))

      where:      k  =  Specified constant, zero through nine
		  Ni =  Number of tasks responded to by case i
                  Nj =  Number of tasks responded to by case j
                  Nij=  Number of tasks responded to by both i and j
                  Pij=  Number of tasks responded to by both i and j if the 'A'
                        or 'P' option is NOT specified on the program invocation
                        line.  If the 'P' option is specified, Pij is the sum of
                        the proportions of the raw task responses,
                        MIN(Ri,Rj)/MAX(Ri,Rj). If the 'A' option is used, the
                        raw responses are replaced by user specified values
                        before the ratios are computed.

By mixing options and similarity measures, there are many possible equations that may be used!

Equations

This section is provided to outline the equations used by OVRLAP to compute the various similarity measures which will be clustered by GROUP.

	CASEi = The task values for case i
	CASEj = The task values for case j
	Ni = Number of tasks responded to by case i
	Nj = Number of tasks responded to by case j
	ntask = number of tasks in job inventory
	k = The constant set by numeric measure k (0-9)
	Nij = Number of common tasks responded to by both i and j
	Pij = Sum of raw task response proportions
	Aij = Sum of adjusted raw task response 

Time Spent Equation (' ')

This is the most commonly used equation. It simply overlaps the percent time spent on each task.

ovrlap = sum ( MIN (CASEi(task),CASEj(task)) )

Common Task Equation (k, where k can range from 0 - 9)

   ovrlap = (Nij/(Ni+k) + Nij/(Nj+k)) * 50.0
   ovrlap = (Pij/(Ni+k) + Pij/(Nj+k)) * 50.0  [ P option ]
   ovrlap = (Aij/(Ni+k) + Aij/(Nj+k)) * 50.0  [ A option ]

Average Common Task Overlap Equation (A)

This similarity measure was designed to cluster a group with its best remaining subset.

   ovrlap = (Nij / (Ni+Nj-Nij)) * 100.0
   ovrlap = (Pij / (Ni+Nj-Nij)) * 100.0    [ P option ]
   ovrlap = (Aij / (Ni+Nj-Nij)) * 100.0    [ A option ]

Overlap Common Tasks Zeroes Meaningful (B)

This equation considers zero to be a meaningful value. An appropriate use would be for a scale such as training emphasis, where a zero is not equivalent to a non-response.

   ovrlap = ( 1 - (Ni+Nj-Nij)/ntask ) * 100.0

Similarity measure 'C'

   ovrlap = 200 * Nij / (Ni + Nj)
   ovrlap = 200 * Pij / (Ni + Nj)      [ P option ]
   ovrlap = 200 * Aij / (Ni + Nj)      [ A option ]

Similarity measure 'D'

   ovrlap = 100 * Nij / SQRT( Ni * Nj )
   ovrlap = 100 * Pij / SQRT( Ni * Nj )   [ P option ]
   ovrlap = 100 * Aij / SQRT( Ni * Nj )   [ A option ]

Adjustment values

If the 'A' option is specified on the program invocation line, the user must enter a replacement value for each valid task response (up to a maximum of 25).


<ival>

ival:   cc 01-04   The replacement value

Examples

@codap ovrlap - ssss cd100 om100 ovrlap ssss

This example will compute an OVRLAP Matrix (om100) using the standard time-spent similarity measure.

@codap ovrlap - ssss cd100 om101
OVRLAP ssss 1

This example will compute the output matrix (om101) based on a common-task similarity measure rather than the default time-spent measure.

Back to document index