  # Excel Formula for Cramer's V

L4 Enterprises LLC

Open Module 1 and copy the function into the “Developer” of your own spreadsheet

Statistical Consulting

From time to time I have had clients wanting a quick assessment of the “correlation” between two Categorical variables.  For example, after producing an orthogonal design for a conjoint analysis, a client might decide that he does not find the low level of one variable to be credible when combined with the high level on another and wants to know the implication of altering this one card without re-creating the design.  I have known researchers to approach this by creating a correlation matrix in excel.  However, using a correlation on categorical data can be troublesome.  A less wrong way of approaching this would be to create a Cramer’s V matrix.  However, this is a function that does not exist in Excel.  I programmed a function to do just this and am putting it out on the internet.  You may use this function however you would like.  If you notice any errors in the function or program any improvement, I would ask you to notify L4 enterprises at: info@L4Statistics.com.

To utilize this function please download the excel file by clicking HERE.  Upon opening the file and ensuring that macros are enabled you may use this sheet itself or copy the function into your own sheet by going to “Developer” and opening visual basic:

If macros are enabled on your spreadsheet, you may use =Cramv(field1,field2) Just like you would use =Correl(field1, field2).

Example 1 demonstrates the weakness of using a correlation to assess the relationship of two categorical variables.  In this case a three level variable is perfectly related to a two level variable in such a way that whenever the three level variable is 1 or 3 the two level variable is 0 otherwise the two level variable is 1.  If we examine this relationship using correlations, we will find no correlation which would suggest no relationship.  The perfect relationship is captured in the Cramer's V Matrix.

The second example demonstrates various equivalencies between Cramer’s V and a correlation when the variables are two-level.  Var1 and Var2 are correlated at 0.1.  Because, they are dummy coded as "0" and "1" we would expect the same relationship in the Cramer's V.  Var3 is perfectly but negatively correlated with Var2 producing a correlation of -1.  The same strength is captured in Cramer's V except because the order of a categorical variable is meaningless, the negative is lost.  Finally, Var4 and Var3 are correlated at -0.3.  The Cramer's V relationship is of the same strength but looses the negative value.

If you distrust opening my macros, the code for the function is:

Function Cramv(field1 As Range, field2 As Range)

Dim arr1() As Variant 'First Array

Dim arr2() As Variant 'Second Array

Dim lCtr As Long, lCount As Long 'Indexs

Dim iCtr As Integer 'index

Dim col1 As New Collection 'Collection to collect unique values of array1

Dim col2 As New Collection 'Collection to collect unique values of array1

Dim observed() As Variant

Dim Rowtotal() As Variant

Dim Columntotal() As Variant

Dim nvans1 As Long

Dim nvans2 As Long

Dim exptected() As Variant

Dim Xsq As Double

Dim k As Long

'Finding length of arrays

Dim lStartPoint As Long

Dim lEndPoint As Long

arr1 = field1.Value

arr2 = field2.Value

lStartPoint = LBound(arr1)

lEndPoint = UBound(arr1)

mStartPoint = LBound(arr2)

mendpoint = UBound(arr2)

ReDim arr1(1 To lEndPoint)

ReDim arr2(1 To lEndPoint)

arr1 = field1.Value

arr2 = field2.Value

'Testing to ensure equal size ranges

If lEndPoint <> mendpoint Then

MsgBox "Please ensure there are an equal number of cells in both ranges"

Exit Function

End If

'Find the inique valurs for the first array

For lCtr = lStartPoint To lEndPoint

vItem = arr1(lCtr, 1)

sIndex = CStr(vItem)

If lCtr = lStartPoint Then

ReDim vAns1(lStartPoint To lStartPoint) As Variant

vAns1(lStartPoint) = vItem

Else

On Error Resume Next

If Err.Number = 0 Then

lCount = UBound(vAns1) + 1

ReDim Preserve vAns1(lStartPoint To lCount)

vAns1(lCount) = vItem

End If

End If

Err.Clear

Next lCtr

'Find the inique valurs for the second array

For lCtr = mStartPoint To mendpoint

vItem = arr2(lCtr, 1)

sIndex = CStr(vItem)

If lCtr = mStartPoint Then

ReDim vAns2(mStartPoint To mStartPoint) As Variant

vAns2(mStartPoint) = vItem

Else

On Error Resume Next

If Err.Number = 0 Then

lCount = UBound(vAns2) + 1

ReDim Preserve vAns2(mStartPoint To lCount)

vAns2(lCount) = vItem

End If

End If

Err.Clear

Next lCtr

'finding length of vAns

nvans1 = UBound(vAns1)

nvans2 = UBound(vAns2)

'finding ovserved cells of for chisq

ReDim observed(1 To nvans2, 1 To nvans1) As Variant

ReDim Expected(1 To nvans2, 1 To nvans1) As Variant

For lCount = 1 To nvans1

For iCtr = 1 To nvans2

observed(lCount, iCtr) = 0

Next iCtr

Next lCount

For lCtr = 1 To mendpoint

For lCount = 1 To nvans2 'rows

For iCtr = 1 To nvans1 'columns

If arr1(lCtr, 1) = vAns1(iCtr) Then

If arr2(lCtr, 1) = vAns2(lCount) Then

observed(lCount, iCtr) = observed(lCount, iCtr) + 1

End If

End If

Next iCtr

Next lCount

Next lCtr

'finding row and column totals

ReDim Rowtotal(nvans1) As Variant

ReDim Columntotal(nvans2) As Variant

For lCtr = 1 To nvans1

Rowtotal(lCtr) = 0

Next lCtr

For lCtr = 1 To nvans2

Columntotal(lCtr) = 0

Next lCtr

For lCtr = 1 To nvans1

For lCount = 1 To nvans2

Rowtotal(lCtr) = Rowtotal(lCtr) + observed(lCount, lCtr)

Next lCount

Rowtotal(lCtr) = Rowtotal(lCtr) / mendpoint

Next lCtr

For lCtr = 1 To nvans2

For lCount = 1 To nvans1

Columntotal(lCtr) = Columntotal(lCtr) + observed(lCtr, lCount)

Next lCount

Next lCtr

Cramv = Columntotal(1)

'finding expected values and Chisq

For lCtr = 1 To nvans2

For lCount = 1 To nvans1

Expected(lCtr, lCount) = Columntotal(lCtr) * Rowtotal(lCount)

Xsq = Xsq + (((observed(lCtr, lCount) - Expected(lCtr, lCount)) ^ 2) / Expected(lCtr, lCount))

Next lCount

Next lCtr

'Finding value of k

k = nvans1

If nvans2 < k Then k = nvans2

Cramv = (Xsq / (mendpoint * (k - 1))) ^ 0.5

End Function