26 Mantel-Haenszel Test of Linear Trend

The Mantel-Haenszel Test is a method for testing independence of categorical variables on an ordinal scale. See Agresti (1996) for more discussion.

Let \(X\) be a categorical variable of ordinal type with \(R\) levels.

Let \(Y\) be a categorical variable of ordinal type with \(C\) levels.

Suppose we take a sample of size \(n\) and take a measurement on each item in the sample with respect to \(X\) and \(Y\). The presence of a progresive between \(X\) and \(Y\) can be tested using the correlation coefficient \(\rho\) (Mantel 1963). We may begin by taking the estimate of \(\rho\)

\[\begin{aligned} r &= \frac{\widehat{Cov}(X,Y)}{\sqrt{s_X^2 s_Y^2}} \\ &= \frac{\sum\limits_{i=1}^{n}\sum\limits_{j=1}^{n}(x_i-\bar x)(y_j-\bar y)p(x_i,y_j)} {\sqrt{\sum\limits_{i=1}^{n}(x_i-\bar x)^2p(x_i) \sum\limits_{j=1}^{n}(y_j-\bar y)^2p(y_j)}} \end{aligned}\]

But since \(X\) and \(Y\) are categorical, we cannot sensibly perform any of the operations. Instead, we define the variables \(U\) and \(V\) to be the ordinal scoring of \(X\) and \(Y\) respecitively. In other words, \(U_i\) is the score for the category of \(X_i\) and \(V_i\) is the score for the category of \(Y_i\). Using this replacement we get

\[ r = \frac{\sum\limits_{i=1}^{n}\sum\limits_{j=1}^{n}(u_i-\bar u)(v_j-\bar v)p(u_i,v_j)} {\sqrt{\sum\limits_{i=1}^{n}(u_i-\bar u)^2p(u_i) \sum\limits_{j=1}^{n}(v_j-\bar v)^2p(v_j)}} \]

To obtain the values of \(\bar u\) and \(\bar v\), we consider the following table. Recall that there are \(R\) levels of the variable \(X\) and \(C\) levels of the variable \(Y\).

u1

Category of \(V\)
Category of \(U\) 1 2 \(C\) Total \(U\)
1 \(n_{1,1}\) \(n_{1,2}\) \(n_{1,c}\) \(n_{1,+}\) \(U_1\)
2 \(n_{2,1}\) \(n_{2,2}\) \(n_{2,c}\) \(n_{2,+}\) \(U_2\)
\(R\) \(n_{r,1}\) \(n_{r,2}\) \(n_{r,c}\) \(n_{r,+}\) \(U_r\)
Total \(n_{+,1}\) \(n_{+,2}\) \(n_{+,c}\) \(n_{+,+}\)
\(V\) \(V_1\) \(V_2\) \(V_c\)



In the table, \(n_{rc},\ r=1,2,\ldots,R,\ c=1,2,\ldots,C\) is the number of observations in the sample with scores \(r\) and \(c\). From the table we can understand the marginal distributions of \(U\) and \(V\), and we see that for \(r=1,2,\ldots,R,\ c=1,2,\ldots,C\)

\[\begin{aligned} p(u_r) &= \frac{n_{r+}}{n} \\ \\ p(v_c) &= \frac{n_{+ c}}{n} \\ \\ p(u_r,v_c) &= \frac{n_{rc}}{n} \\ \\ \bar u &= \sum\limits_{r=1}^{R}u_i\frac{n_{r+}}{n} \\ \\ \bar v &= \sum\limits_{c=1}^{C}v_i\frac{n_{+ c}}{n} \end{aligned}\]

With these observations, we can derive the value of \(r\) as

\[\begin{aligned} r &= \frac{\widehat{Cov}(U,V)}{\sqrt{s_U^2s_V^2}} \\ &= \frac{\frac{\sum\limits_{r=1}^{R}\sum\limits_{c=1}^{C} (u_r-\bar u)(v_c-\bar v)n_{rc}}{n-1}} {\sqrt{\frac{\sum\limits_{r=1}^{R}(u_r-\bar u)^2}{n-1} \frac{\sum\limits_{c=1}^{C}(v_c-\bar v)^2}{n-1}}} \\ &= \frac{\frac{1}{n-1}\sum\limits_{r=1}^{R}\sum\limits_{c=1}^{C} (u_r-\bar u)(v_c-\bar v)n_{rc}} {\frac{1}{n-1}\sqrt{\sum\limits_{r=1}^{R}(u_r-\bar u)^2 \sum\limits_{c=1}^{C}(v_c-\bar v)^2}} \\ &= \frac{\sum\limits_{r=1}^{R}\sum\limits_{c=1}^{C}(u_r-\bar u)(v_c-\bar v)n_{rc}} {\sqrt{\sum\limits_{r=1}^{R}(u_r-\bar u)^2 \sum\limits_{c=1}^{C}(v_c-\bar v)^2}} \\ &= \frac{\sum\limits_{r=1}^{R}\sum\limits_{c=1}^{C} (u_rv_c-u_r\bar v-\bar uv_c+\bar u\bar v)n_{rc}} {\sqrt{\bigg(\sum\limits_{r=1}^{R}u_r^2n_{r+} - \frac{1}{n}\Big(\sum\limits_{r=1}^{R}u_rn_{r+}\Big)^2\bigg) \bigg(\sum\limits_{c=1}^{C}v_c^2n_{+ c} - \frac{1}{n}\Big(\sum\limits_{c=1}^{C}v_cn_{+ c}\Big)^2\bigg)}} \\ &= \frac{\sum\limits_{r=1}^{R}\sum\limits_{c=1}^{C} (u_rv_cn_{rc}-u_r\bar vn_{rc}-\bar uv_cn_{rc}+\bar u\bar vn_{rc})} {\sqrt{\bigg(\sum\limits_{r=1}^{R}u_r^2n_{r+} - \frac{1}{n}\Big(\sum\limits_{r=1}^{R}u_rn_{r+}\Big)^2\bigg) \bigg(\sum\limits_{c=1}^{C}v_c^2n_{+ c} - \frac{1}{n}\Big(\sum\limits_{c=1}^{C}v_cn_{+ c}\Big)^2\bigg)}} \\ &= \frac{\sum\limits_{r=1}^{R}\sum\limits_{c=1}^{C}u_rv_cn_{rc} - \sum\limits_{r=1}^{R}\sum\limits_{c=1}^{C}u_r\bar vn_{rc} - \sum\limits_{r=1}^{R}\sum\limits_{c=1}^{C}\bar uv_cn_{rc} + \sum\limits_{r=1}^{R}\sum\limits_{c=1}^{C}\bar u\bar vn_{rc}} {\sqrt{\bigg(\sum\limits_{r=1}^{R}u_r^2n_{r+} - \frac{1}{n}\Big(\sum\limits_{r=1}^{R}u_rn_{r+}\Big)^2\bigg) \bigg(\sum\limits_{c=1}^{C}v_c^2n_{+ c} - \frac{1}{n}\Big(\sum\limits_{c=1}^{C}v_cn_{+ c}\Big)^2\bigg)}} \\ &= \frac{\sum\limits_{r=1}^{R}\sum\limits_{c=1}^{C}u_rv_cn_{rc} - \bar v\sum\limits_{r=1}^{R}\sum\limits_{c=1}^{C}u_rn_{rc} - \bar u\sum\limits_{r=1}^{R}\sum\limits_{c=1}^{C}v_cn_{rc} + \bar u\bar v\sum\limits_{r=1}^{R}\sum\limits_{c=1}^{C}n_{rc}} {\sqrt{\bigg(\sum\limits_{r=1}^{R}u_r^2n_{r+} - \frac{1}{n}\Big(\sum\limits_{r=1}^{R}u_rn_{r+}\Big)^2\bigg) \bigg(\sum\limits_{c=1}^{C}v_c^2n_{+ c} - \frac{1}{n}\Big(\sum\limits_{c=1}^{C}v_cn_{+ c}\Big)^2\bigg)}} \\ &= \frac{\sum\limits_{r=1}^{R}\sum\limits_{c=1}^{C}u_rv_cn_{rc} - \bar v\sum\limits_{r=1}^{R}u_rn_{r+} - \bar u\sum\limits_{c=1}^{C}v_cn_{+ c} + \bar u\bar vn} {\sqrt{\bigg(\sum\limits_{r=1}^{R}u_r^2n_{r+} - \frac{1}{n}\Big(\sum\limits_{r=1}^{R}u_rn_{r+}\Big)^2\bigg) \bigg(\sum\limits_{c=1}^{C}v_c^2n_{+ c} - \frac{1}{n}\Big(\sum\limits_{c=1}^{C}v_cn_{+ c}\Big)^2\bigg)}} \\ &= \frac{\sum\limits_{r=1}^{R}\sum\limits_{c=1}^{C}u_rv_cn_{rc} - \frac{\sum\limits_{c=1}^{C}v_cn_{+ c}\sum\limits_{r=1}^{R} u_rn_{r+}}{n} - \frac{\sum\limits_{r=1}^{R}u_rn_{r+}\sum\limits_{c=1}^{C} v_cn_{+ c}}{n} + n\frac{\sum\limits_{r=1}^{R}u_rn_{r+}\sum\limits_{c=1}^{C} v_cn_{+ c}}{n^2}} {\sqrt{\bigg(\sum\limits_{r=1}^{R}u_r^2n_{r+} - \frac{1}{n}\Big(\sum\limits_{r=1}^{R}u_rn_{r+}\Big)^2\bigg) \bigg(\sum\limits_{c=1}^{C}v_c^2n_{+ c} - \frac{1}{n}\Big(\sum\limits_{c=1}^{C}v_cn_{+ c}\Big)^2\bigg)}} \\ &= \frac{\sum\limits_{r=1}^{R}\sum\limits_{c=1}^{C}u_rv_cn_{rc} - \frac{2\sum\limits_{r=1}^{R}u_rn_{r+}\sum\limits_{c=1}^{C} v_cn_{+ c}}{n} + \frac{\sum\limits_{r=1}^{R}u_rn_{r+}\sum\limits_{c=1}^{C} v_cn_{+ c}}{n}} {\sqrt{\bigg(\sum\limits_{r=1}^{R}u_r^2n_{r+} - \frac{1}{n}\Big(\sum\limits_{r=1}^{R}u_rn_{r+}\Big)^2\bigg) \bigg(\sum\limits_{c=1}^{C}v_c^2n_{+ c} - \frac{1}{n}\Big(\sum\limits_{c=1}^{C}v_cn_{+ c}\Big)^2\bigg)}} \\ &= \frac{\sum\limits_{r=1}^{R}\sum\limits_{c=1}^{C}u_rv_cn_{rc} - \frac{1}{n}(\sum\limits_{r=1}^{R}u_rn_{r+}) (\sum\limits_{c=1}^{C}v_cn_{+ c})} {\sqrt{\bigg(\sum\limits_{r=1}^{R}u_r^2n_{r+} - \frac{1}{n}\Big(\sum\limits_{r=1}^{R}u_rn_{r+}\Big)^2\bigg) \bigg(\sum\limits_{c=1}^{C}v_c^2n_{+ c} - \frac{1}{n}\Big(\sum\limits_{c=1}^{C}v_cn_{+ c}\Big)^2\bigg)}} \end{aligned}\]

References

Agresti, Alan. 1996. An Introduction to Categorical Data Analysis. John Wiley; Sons.

Mantel, Nathan. 1963. “Chi-Square Tests with One Degree of Freedom; Extensions of the Mantel- Haenszel Procedure.” Journal of the American Statistical Association 58 (303): 690–700.