From time to time I have had clients wanting a quick assessment of the “correlation” between two Categorical variables.  For example, after producing an orthogonal design for a conjoint analysis, a client might decide that he does not find the low level of one variable to be credible when combined with the high level on another and wants to know the implication of altering this one card without re-creating the design.  I have known researchers to approach this by creating a correlation matrix in excel.  However, using a correlation on categorical data can be troublesome.  A less wrong way of approaching this would be to create a Cramer’s V matrix.  However, this is a function that does not exist in Excel.  I programmed a function to do just this and am putting it out on the internet.  You may use this function however you would like.  If you notice any errors in the function or program any improvement, I would ask you to notify L4 enterprises at: info@L4Statistics.com.

To utilize this function please download the excel file by clicking HERE.  Upon opening the file and ensuring that macros are enabled you may use this sheet itself or copy the function into your own sheet by going to “Developer” and opening visual basic:


Open Module 1 and copy the function into the “Developer” of your own spreadsheet

Statistical Consulting

Excel Formula for Cramer's V

L4 Enterprises LLC 

If macros are enabled on your spreadsheet, you may use =Cramv(field1,field2) Just like you would use =Correl(field1, field2).

Example 1 demonstrates the weakness of using a correlation to assess the relationship of two categorical variables.  In this case a three level variable is perfectly related to a two level variable in such a way that whenever the three level variable is 1 or 3 the two level variable is 0 otherwise the two level variable is 1.  If we examine this relationship using correlations, we will find no correlation which would suggest no relationship.  The perfect relationship is captured in the Cramer's V Matrix.

The second example demonstrates various equivalencies between Cramer’s V and a correlation when the variables are two-level.  Var1 and Var2 are correlated at 0.1.  Because, they are dummy coded as "0" and "1" we would expect the same relationship in the Cramer's V.  Var3 is perfectly but negatively correlated with Var2 producing a correlation of -1.  The same strength is captured in Cramer's V except because the order of a categorical variable is meaningless, the negative is lost.  Finally, Var4 and Var3 are correlated at -0.3.  The Cramer's V relationship is of the same strength but looses the negative value.

 If you distrust opening my macros, the code for the function is:

Function Cramv(field1 As Range, field2 As Range)

    Dim arr1() As Variant 'First Array

    Dim arr2() As Variant 'Second Array

    Dim lCtr As Long, lCount As Long 'Indexs

      Dim iCtr As Integer 'index

    Dim col1 As New Collection 'Collection to collect unique values of array1

    Dim col2 As New Collection 'Collection to collect unique values of array1

    Dim observed() As Variant

    Dim Rowtotal() As Variant

    Dim Columntotal() As Variant

    Dim nvans1 As Long

    Dim nvans2 As Long

    Dim exptected() As Variant

    Dim Xsq As Double

    Dim k As Long

 

'Finding length of arrays

    Dim lStartPoint As Long

    Dim lEndPoint As Long

    arr1 = field1.Value

    arr2 = field2.Value

    lStartPoint = LBound(arr1)

    lEndPoint = UBound(arr1)

    mStartPoint = LBound(arr2)

    mendpoint = UBound(arr2)

    ReDim arr1(1 To lEndPoint)

    ReDim arr2(1 To lEndPoint)

    arr1 = field1.Value

    arr2 = field2.Value

    

'Testing to ensure equal size ranges

     If lEndPoint <> mendpoint Then

          MsgBox "Please ensure there are an equal number of cells in both ranges"

          Exit Function

     End If

   

   

'Find the inique valurs for the first array

    For lCtr = lStartPoint To lEndPoint

       vItem = arr1(lCtr, 1)

   

       sIndex = CStr(vItem)

 

    'first element, add automatically

    If lCtr = lStartPoint Then

        col1.Add vItem, sIndex

        ReDim vAns1(lStartPoint To lStartPoint) As Variant

        vAns1(lStartPoint) = vItem

    Else

        On Error Resume Next

        col1.Add vItem, sIndex

      

        If Err.Number = 0 Then

            lCount = UBound(vAns1) + 1

            ReDim Preserve vAns1(lStartPoint To lCount)

            vAns1(lCount) = vItem

        End If

    End If

    Err.Clear

Next lCtr

   

'Find the inique valurs for the second array

    For lCtr = mStartPoint To mendpoint

        vItem = arr2(lCtr, 1)

   

    sIndex = CStr(vItem)

 

    'first element, add automatically

    If lCtr = mStartPoint Then

        col2.Add vItem, sIndex

        ReDim vAns2(mStartPoint To mStartPoint) As Variant

        vAns2(mStartPoint) = vItem

    Else

       On Error Resume Next

       col2.Add vItem, sIndex

       

        If Err.Number = 0 Then

            lCount = UBound(vAns2) + 1

            ReDim Preserve vAns2(mStartPoint To lCount)

            vAns2(lCount) = vItem

        End If

    End If

    Err.Clear

Next lCtr

   

'finding length of vAns

nvans1 = UBound(vAns1)

nvans2 = UBound(vAns2)

 

'finding ovserved cells of for chisq

 

ReDim observed(1 To nvans2, 1 To nvans1) As Variant

ReDim Expected(1 To nvans2, 1 To nvans1) As Variant

 

     For lCount = 1 To nvans1

          For iCtr = 1 To nvans2

               observed(lCount, iCtr) = 0

          Next iCtr

     Next lCount

 

For lCtr = 1 To mendpoint

     For lCount = 1 To nvans2 'rows

          For iCtr = 1 To nvans1 'columns

               If arr1(lCtr, 1) = vAns1(iCtr) Then

                    If arr2(lCtr, 1) = vAns2(lCount) Then

                         observed(lCount, iCtr) = observed(lCount, iCtr) + 1

                    End If

               End If

          Next iCtr

     Next lCount

Next lCtr

   

'finding row and column totals

 

ReDim Rowtotal(nvans1) As Variant

ReDim Columntotal(nvans2) As Variant

 

For lCtr = 1 To nvans1

     Rowtotal(lCtr) = 0

Next lCtr

For lCtr = 1 To nvans2

     Columntotal(lCtr) = 0

Next lCtr

 

For lCtr = 1 To nvans1

     For lCount = 1 To nvans2

          Rowtotal(lCtr) = Rowtotal(lCtr) + observed(lCount, lCtr)

     Next lCount

     Rowtotal(lCtr) = Rowtotal(lCtr) / mendpoint

Next lCtr

         

For lCtr = 1 To nvans2

     For lCount = 1 To nvans1

          Columntotal(lCtr) = Columntotal(lCtr) + observed(lCtr, lCount)

     Next lCount

 Next lCtr

    Cramv = Columntotal(1)

   

'finding expected values and Chisq

For lCtr = 1 To nvans2

     For lCount = 1 To nvans1

          Expected(lCtr, lCount) = Columntotal(lCtr) * Rowtotal(lCount)

          Xsq = Xsq + (((observed(lCtr, lCount) - Expected(lCtr, lCount)) ^ 2) / Expected(lCtr, lCount))

     Next lCount

Next lCtr

   

'Finding value of k

k = nvans1

If nvans2 < k Then k = nvans2

Cramv = (Xsq / (mendpoint * (k - 1))) ^ 0.5

 

End Function