# 26 Mantel-Haenszel Test of Linear Trend

The Mantel-Haenszel Test is a method for testing independence of categorical variables on an ordinal scale. See Agresti (1996) for more discussion.

Let $$X$$ be a categorical variable of ordinal type with $$R$$ levels.

Let $$Y$$ be a categorical variable of ordinal type with $$C$$ levels.

Suppose we take a sample of size $$n$$ and take a measurement on each item in the sample with respect to $$X$$ and $$Y$$. The presence of a progresive between $$X$$ and $$Y$$ can be tested using the correlation coefficient $$\rho$$ (Mantel 1963). We may begin by taking the estimate of $$\rho$$

\begin{aligned} r &= \frac{\widehat{Cov}(X,Y)}{\sqrt{s_X^2 s_Y^2}} \\ &= \frac{\sum\limits_{i=1}^{n}\sum\limits_{j=1}^{n}(x_i-\bar x)(y_j-\bar y)p(x_i,y_j)} {\sqrt{\sum\limits_{i=1}^{n}(x_i-\bar x)^2p(x_i) \sum\limits_{j=1}^{n}(y_j-\bar y)^2p(y_j)}} \end{aligned}

But since $$X$$ and $$Y$$ are categorical, we cannot sensibly perform any of the operations. Instead, we define the variables $$U$$ and $$V$$ to be the ordinal scoring of $$X$$ and $$Y$$ respecitively. In other words, $$U_i$$ is the score for the category of $$X_i$$ and $$V_i$$ is the score for the category of $$Y_i$$. Using this replacement we get

$r = \frac{\sum\limits_{i=1}^{n}\sum\limits_{j=1}^{n}(u_i-\bar u)(v_j-\bar v)p(u_i,v_j)} {\sqrt{\sum\limits_{i=1}^{n}(u_i-\bar u)^2p(u_i) \sum\limits_{j=1}^{n}(v_j-\bar v)^2p(v_j)}}$

To obtain the values of $$\bar u$$ and $$\bar v$$, we consider the following table. Recall that there are $$R$$ levels of the variable $$X$$ and $$C$$ levels of the variable $$Y$$.

u1

Category of $$V$$
Category of $$U$$ 1 2 $$C$$ Total $$U$$
1 $$n_{1,1}$$ $$n_{1,2}$$ $$n_{1,c}$$ $$n_{1,+}$$ $$U_1$$
2 $$n_{2,1}$$ $$n_{2,2}$$ $$n_{2,c}$$ $$n_{2,+}$$ $$U_2$$
$$R$$ $$n_{r,1}$$ $$n_{r,2}$$ $$n_{r,c}$$ $$n_{r,+}$$ $$U_r$$
Total $$n_{+,1}$$ $$n_{+,2}$$ $$n_{+,c}$$ $$n_{+,+}$$
$$V$$ $$V_1$$ $$V_2$$ $$V_c$$

In the table, $$n_{rc},\ r=1,2,\ldots,R,\ c=1,2,\ldots,C$$ is the number of observations in the sample with scores $$r$$ and $$c$$. From the table we can understand the marginal distributions of $$U$$ and $$V$$, and we see that for $$r=1,2,\ldots,R,\ c=1,2,\ldots,C$$

\begin{aligned} p(u_r) &= \frac{n_{r+}}{n} \\ \\ p(v_c) &= \frac{n_{+ c}}{n} \\ \\ p(u_r,v_c) &= \frac{n_{rc}}{n} \\ \\ \bar u &= \sum\limits_{r=1}^{R}u_i\frac{n_{r+}}{n} \\ \\ \bar v &= \sum\limits_{c=1}^{C}v_i\frac{n_{+ c}}{n} \end{aligned}

With these observations, we can derive the value of $$r$$ as

\begin{aligned} r &= \frac{\widehat{Cov}(U,V)}{\sqrt{s_U^2s_V^2}} \\ &= \frac{\frac{\sum\limits_{r=1}^{R}\sum\limits_{c=1}^{C} (u_r-\bar u)(v_c-\bar v)n_{rc}}{n-1}} {\sqrt{\frac{\sum\limits_{r=1}^{R}(u_r-\bar u)^2}{n-1} \frac{\sum\limits_{c=1}^{C}(v_c-\bar v)^2}{n-1}}} \\ &= \frac{\frac{1}{n-1}\sum\limits_{r=1}^{R}\sum\limits_{c=1}^{C} (u_r-\bar u)(v_c-\bar v)n_{rc}} {\frac{1}{n-1}\sqrt{\sum\limits_{r=1}^{R}(u_r-\bar u)^2 \sum\limits_{c=1}^{C}(v_c-\bar v)^2}} \\ &= \frac{\sum\limits_{r=1}^{R}\sum\limits_{c=1}^{C}(u_r-\bar u)(v_c-\bar v)n_{rc}} {\sqrt{\sum\limits_{r=1}^{R}(u_r-\bar u)^2 \sum\limits_{c=1}^{C}(v_c-\bar v)^2}} \\ &= \frac{\sum\limits_{r=1}^{R}\sum\limits_{c=1}^{C} (u_rv_c-u_r\bar v-\bar uv_c+\bar u\bar v)n_{rc}} {\sqrt{\bigg(\sum\limits_{r=1}^{R}u_r^2n_{r+} - \frac{1}{n}\Big(\sum\limits_{r=1}^{R}u_rn_{r+}\Big)^2\bigg) \bigg(\sum\limits_{c=1}^{C}v_c^2n_{+ c} - \frac{1}{n}\Big(\sum\limits_{c=1}^{C}v_cn_{+ c}\Big)^2\bigg)}} \\ &= \frac{\sum\limits_{r=1}^{R}\sum\limits_{c=1}^{C} (u_rv_cn_{rc}-u_r\bar vn_{rc}-\bar uv_cn_{rc}+\bar u\bar vn_{rc})} {\sqrt{\bigg(\sum\limits_{r=1}^{R}u_r^2n_{r+} - \frac{1}{n}\Big(\sum\limits_{r=1}^{R}u_rn_{r+}\Big)^2\bigg) \bigg(\sum\limits_{c=1}^{C}v_c^2n_{+ c} - \frac{1}{n}\Big(\sum\limits_{c=1}^{C}v_cn_{+ c}\Big)^2\bigg)}} \\ &= \frac{\sum\limits_{r=1}^{R}\sum\limits_{c=1}^{C}u_rv_cn_{rc} - \sum\limits_{r=1}^{R}\sum\limits_{c=1}^{C}u_r\bar vn_{rc} - \sum\limits_{r=1}^{R}\sum\limits_{c=1}^{C}\bar uv_cn_{rc} + \sum\limits_{r=1}^{R}\sum\limits_{c=1}^{C}\bar u\bar vn_{rc}} {\sqrt{\bigg(\sum\limits_{r=1}^{R}u_r^2n_{r+} - \frac{1}{n}\Big(\sum\limits_{r=1}^{R}u_rn_{r+}\Big)^2\bigg) \bigg(\sum\limits_{c=1}^{C}v_c^2n_{+ c} - \frac{1}{n}\Big(\sum\limits_{c=1}^{C}v_cn_{+ c}\Big)^2\bigg)}} \\ &= \frac{\sum\limits_{r=1}^{R}\sum\limits_{c=1}^{C}u_rv_cn_{rc} - \bar v\sum\limits_{r=1}^{R}\sum\limits_{c=1}^{C}u_rn_{rc} - \bar u\sum\limits_{r=1}^{R}\sum\limits_{c=1}^{C}v_cn_{rc} + \bar u\bar v\sum\limits_{r=1}^{R}\sum\limits_{c=1}^{C}n_{rc}} {\sqrt{\bigg(\sum\limits_{r=1}^{R}u_r^2n_{r+} - \frac{1}{n}\Big(\sum\limits_{r=1}^{R}u_rn_{r+}\Big)^2\bigg) \bigg(\sum\limits_{c=1}^{C}v_c^2n_{+ c} - \frac{1}{n}\Big(\sum\limits_{c=1}^{C}v_cn_{+ c}\Big)^2\bigg)}} \\ &= \frac{\sum\limits_{r=1}^{R}\sum\limits_{c=1}^{C}u_rv_cn_{rc} - \bar v\sum\limits_{r=1}^{R}u_rn_{r+} - \bar u\sum\limits_{c=1}^{C}v_cn_{+ c} + \bar u\bar vn} {\sqrt{\bigg(\sum\limits_{r=1}^{R}u_r^2n_{r+} - \frac{1}{n}\Big(\sum\limits_{r=1}^{R}u_rn_{r+}\Big)^2\bigg) \bigg(\sum\limits_{c=1}^{C}v_c^2n_{+ c} - \frac{1}{n}\Big(\sum\limits_{c=1}^{C}v_cn_{+ c}\Big)^2\bigg)}} \\ &= \frac{\sum\limits_{r=1}^{R}\sum\limits_{c=1}^{C}u_rv_cn_{rc} - \frac{\sum\limits_{c=1}^{C}v_cn_{+ c}\sum\limits_{r=1}^{R} u_rn_{r+}}{n} - \frac{\sum\limits_{r=1}^{R}u_rn_{r+}\sum\limits_{c=1}^{C} v_cn_{+ c}}{n} + n\frac{\sum\limits_{r=1}^{R}u_rn_{r+}\sum\limits_{c=1}^{C} v_cn_{+ c}}{n^2}} {\sqrt{\bigg(\sum\limits_{r=1}^{R}u_r^2n_{r+} - \frac{1}{n}\Big(\sum\limits_{r=1}^{R}u_rn_{r+}\Big)^2\bigg) \bigg(\sum\limits_{c=1}^{C}v_c^2n_{+ c} - \frac{1}{n}\Big(\sum\limits_{c=1}^{C}v_cn_{+ c}\Big)^2\bigg)}} \\ &= \frac{\sum\limits_{r=1}^{R}\sum\limits_{c=1}^{C}u_rv_cn_{rc} - \frac{2\sum\limits_{r=1}^{R}u_rn_{r+}\sum\limits_{c=1}^{C} v_cn_{+ c}}{n} + \frac{\sum\limits_{r=1}^{R}u_rn_{r+}\sum\limits_{c=1}^{C} v_cn_{+ c}}{n}} {\sqrt{\bigg(\sum\limits_{r=1}^{R}u_r^2n_{r+} - \frac{1}{n}\Big(\sum\limits_{r=1}^{R}u_rn_{r+}\Big)^2\bigg) \bigg(\sum\limits_{c=1}^{C}v_c^2n_{+ c} - \frac{1}{n}\Big(\sum\limits_{c=1}^{C}v_cn_{+ c}\Big)^2\bigg)}} \\ &= \frac{\sum\limits_{r=1}^{R}\sum\limits_{c=1}^{C}u_rv_cn_{rc} - \frac{1}{n}(\sum\limits_{r=1}^{R}u_rn_{r+}) (\sum\limits_{c=1}^{C}v_cn_{+ c})} {\sqrt{\bigg(\sum\limits_{r=1}^{R}u_r^2n_{r+} - \frac{1}{n}\Big(\sum\limits_{r=1}^{R}u_rn_{r+}\Big)^2\bigg) \bigg(\sum\limits_{c=1}^{C}v_c^2n_{+ c} - \frac{1}{n}\Big(\sum\limits_{c=1}^{C}v_cn_{+ c}\Big)^2\bigg)}} \end{aligned}

### References

Agresti, Alan. 1996. An Introduction to Categorical Data Analysis. John Wiley; Sons.

Mantel, Nathan. 1963. “Chi-Square Tests with One Degree of Freedom; Extensions of the Mantel- Haenszel Procedure.” Journal of the American Statistical Association 58 (303): 690–700.