# A Guide to Chi-Squared Tests for Inference

## Differences Between the Tests

Category | Goodness of Fit | Test for Independence | Test for Homogeneity |
---|---|---|---|

Number of Categorical Variables |
1 | 2 | 2 |

Number of Samples |
1 | 1 | 2+ |

Data for Input |
Observed + Expected | Observed Only | Observed Only |

Null Hypothesis |
H Observed values equal the expected values_{0}: |
H Variables are independent_{0}: |
H There is no difference in the distributions_{0}: |

Alternative Hypothesis |
H Observed values do not equal the expected values_{a}: |
H Variables are not independent_{a}: |
H There is a difference in the distributions_{a}: |

## “State” Step

In the state step, you must include the null hypothesis, alternative hypothesis, and the alpha value. The alpha value is just like in the other tests, usually ranging from 0.01 to 0.10 (for reasonable alpha values). The hypotheses, however, are different depending on which chi-squared test you are performing, so you should refer to the table above.

## “Plan” Step

In the plan step, you must first state the name of the test you are performing. Then, you must check all of the conditions necessary for the test in order to proceed to the next steps.

- Random: The data must be collected randomly, whether through a random sample or random assignment.
- Note: If it is random assignment, the test immediately becomes a chi-squared test for homogeneity because that is considered as two samples.

- Independence (10%): Each sample/group must be less than 10% of the entire population that it is being taken from. This is to ensure that the samples are independent of each other.
- Normal: Each expected value must be at least 5.
- Goodness of Fit: The expected values are calculated by multiplying the total number of observations by the proportion of each category in the population.
- Independence/Homogeneity: The expected values are calculated by multiplying the row total by the column total, and dividing by the total number of observations.

## “Do” Step

In the do step, you calculate the test statistic, degrees of freedom, and the p-value of such a test.

### Degrees of Freedom (df):

- Goodness of Fit:
`df = k - 1`

, where k is the number of categories - Independence/Homogeneity:
`df = (r - 1)(c - 1)`

, where`r`

is the number of rows and`c`

is the number of columns

### Test Statistic:

To get the test statistic, first subtract the observed value minus the expected value, square it, and then divide by the expected value. Then, add all of these values together for all cells to get the test statistic.

### P-Value:

To get the p-value, use the degrees of freedom and the test statistic in a chi-squared distribution table or a calculator.

- Calculator: Use the chi-squared cdf function, with the test statistic as the first argument and the degrees of freedom as the second argument.
- Table: The table should have the test statistic, degrees of freedom, and the p-value labeled. Use the two which you have to find the missing p-value.

To see an interactive chi-squared distribution graph, you can visit Stapplet’s Chi-Squared Distribution Graph that allows you to visualize what the graph looks like with different degrees of freedom.

## “Conclude” Step

To conclude the test, you must compare your p-value to your alpha much like in other tests. Reject the null if the p-value is lower than the alpha, and fail to reject it if the p-value is higher than the alpha. Make sure you state your conclusion in context.

## “Follow-Up” Step

If you reject the null hypothesis, you must state the cell that most significantly contributed to your chi-squared test statistic. With this, you must also state the observed value of this cell and whether it was higher/lower than the expected value. **This is not necessary if you fail to reject the null hypothesis.**

- Example:
*“The cell for “Most Likely” in the “United States” sample contributed the greatest to the chi-squared statistic because the observed value of 87 was much greater than the expected value of 21.”*