# 8 Chebychev’s Theorem

## 8.1 Chebychev’s Theorem

In any finite set of numbers and for any real number $$h > 1$$, at least $$(1 - \frac{1}{h^2}) \cdot 100\%$$ of the numbers lie within $$h$$ standard deviations of the mean. In other words, they lie within the interval $$(\mu-h\cdot\sigma , \mu+h\cdot\sigma)$$.

Proof:

For a set $$\{x_1,x_2,\ldots,x_r,x_{r+1},\ldots,x_n\}$$ where, by choice of labeling, $$\{x_1,x_2,\ldots,x_r\}$$ lie outside of $$(\mu-h\cdot\sigma , \mu+h\cdot\sigma)$$. Also, $$\{x_{r+1},\ldots,x_n\}$$ are within the interval. Under these conditions we know

$|x_1-\mu| > h\sigma,\ |x_2-\mu| > h\sigma, \ldots,\ |x_r-\mu| > h\sigma$

Squaring gives

$(x_1-\mu)^2 > h^2\sigma^2,\ (x_2-\mu)^2 > h^2\sigma^2,\ldots,\ (x_r-\mu)^2 > h^2\sigma^2\\ \ \ \ \ \Rightarrow\sum\limits_{i=1}^{r}(x_1-\mu)^2 > \sum\limits_{i=1}^{r}h^2\sigma^2 = rh^2\sigma^2$

Since all $$(x_i-\mu)^2$$ must necessarily be positive,

$\begin{array}{rrcl} &\sum\limits_{i=1}^{r}(x_i-\mu)^2 &<& \sum\limits_{i=1}^{n}(x_i-\mu)^2 \\ \Rightarrow & rh^2\sigma^2 &<& \sum\limits_{i=1}^{n}(x_i-\mu)^2 \\ ^{} \Rightarrow & rh^2\sigma^2 &<& n\sigma^2 \\ \Rightarrow & rh^2 &<& n \\ \Rightarrow & \frac{r}{n} &<& \frac{1}{h^2} \end{array}$

1. $$\sigma^2 = \frac{1}{n}\sum\limits_{i=1}^{n}(x_i-\mu)^2$$
$$\ \ \ \ \Rightarrow n\sigma^2 = \sum\limits_{i=1}^{n}(x_i-\mu)^2$$

and $$\frac{r}{n}$$ is the fraction of numbers outside $$(\mu-h\cdot\sigma , \mu+h\cdot\sigma)$$. By the law of complements, the fraction of numbers inside the interval is $$1 - \frac{r}{n}$$, which implies $$1 - \frac{r}{n} > 1 - \frac{1}{h^2}$$. Thus, more than $$(1-\frac{1}{h^2})\cdot 100\%$$ of the points lie within $$h$$ standard deviations of the mean, or within the interval $$(\mu-h\cdot\sigma , \mu+h\cdot\sigma)$$.

## 8.2 Alternate Proof of Chebychev’s Theorem

In any finite set of numbers and for any real number $$h>1$$, at least $$(1-\frac{1}{h^2})\cdot 100\%$$ of the numbers lie within $$h$$ standard deviations of the mean. In other words, they lie within the interval $$(\mu-h\cdot\sigma,\mu+h\cdot\sigma)$$.\

Proof:

The proof here is done for the discrete case, but is applicable also in the continuous case by replacing the summations with integrals (with integrals, the limits will be from $$-\infty$$ to $$\infty$$).

$\begin{array}{rrcl} & \sigma^2 &=& E(x-\mu)^2 \\ & &=& \sum\limits_{y=0}^{\infty}(y-\mu)^2p(y) \\ & &=& \sum\limits_{y=0}^{\mu-h\sigma}(y-\mu)^2p(y) + \sum\limits_{y=\mu-h\sigma+1}^{\mu+h\sigma-1}(y-\mu)^2p(y) + \sum\limits_{y=\mu+h\sigma}^{\infty}(y-\mu)^2p(y) \\ ^{} \Rightarrow & \sigma^2 &\geq& \sum\limits_{y=0}^{\mu-h\sigma}(y-\mu)^2p(y) + \sum\limits_{y=\mu+h\sigma}^{\infty}(y-\mu)^2p(y)\\ \end{array}$

1. Since all the $$(y-\mu)^2$$ must be positive, removing the middle term will surely result in this inequality.

In both of these summations $$y$$ is outside the interval $$(\mu-h\cdot\sigma , \mu+h\cdot\sigma)$$, so

$\begin{array}{rrcl} & |y-\mu| &\geq& h\sigma \\ \Rightarrow & (y-\mu^2) &\geq& h^2\sigma^2 \\ \Rightarrow & \sigma^2 &\geq& \sum\limits_{y=0}^{\mu-h\sigma}h^2\sigma^2 \Pr(Y = y) + \sum\limits_{\mu+h\sigma}^{\infty}h^2\sigma^2 \Pr(Y = y) \\ \Rightarrow & \sigma^2 &\geq& h^2\sigma^2\Big[\sum\limits_{y=0}^{\mu-h\sigma} \Pr(Y = y) + \sum\limits_{\mu+h\sigma}^{\infty} \Pr(Y = y)\Big] \end{array}$

The first summation is the sum of all probabilities that $$y-\mu < h\sigma$$, i.e. $$P(y-\mu < h\sigma)$$. Likewise, the second summation is $$P(y-\mu > h\sigma)$$.

$\begin{array}{rrcl} \Rightarrow & \sigma^2 &\geq& h^2\sigma^2[P(y-\mu<h\sigma) + P(y-\mu>h\sigma)] \\ \Rightarrow & \sigma^2 &\geq& h^2\sigma^2[P(|y-\mu|>h\sigma)] \\ \Rightarrow & \frac{1}{h^2} &\geq& \Pr(|y-\mu|>h\sigma) \\ \Rightarrow & 1-\frac{1}{h^2} &\leq& \Pr(|y-\mu|>h\sigma) \end{array}$

## 8.3 Chebychev’s Theorem for Absolute Deviation

This theorem is provided by Brunette (Brunette 2003–2007b)

In any finite set of numbers, and for any real number $$h > 1$$, at least $$1 - \frac{1}{h}$$ of the numbers lie within $$h$$ absolute deviations of the mean, where the absolute deviation is defined $$Ab = \frac{1}{n}\sum\limits_{i=1}{n}|x_i-\bar x|$$. In other words, $$1-\frac{1}{h}$$ of the numbers are in the interval $$(\bar x-h\cdot Ab , \bar x+h\cdot Ab)$$.

Proof:

For a set $$\{x_1,x_2,\ldots,x_r,x_{r+1},\ldots,x_n\}$$ where, by choice of labeling, $$\{x_1,x_2,\ldots,x_r\}$$ lie outside of $$(\mu-h\cdot Ab , \mu+h\cdot Ab)$$. Also, $$\{x_{r+1},\ldots,x_n\}$$ are within the interval. Accordingly,

$h \cdot Ab \leq |x_1-\bar x| ,\ h \cdot Ab \leq |x_1-\bar x| ,\ldots ,\ h \cdot Ab \leq |x_1-\bar x|$

$\begin{array}{rrcl} \Rightarrow & r \cdot h \cdot Ab &\leq& \sum\limits_{i=1}^{r}|x_i-\bar x| \\ \Rightarrow & r \cdot h \cdot Ab &\leq& \sum\limits_{i=1}^{n}|x_i-\bar x| \\ ^{} \Rightarrow & r \cdot h \cdot Ab &\leq& n \cdot Ab\\ \Rightarrow & \frac{r}{n} &\leq& \frac{1}{h}\\ \Rightarrow & -\frac{r}{n} &\geq& -\frac{1}{h}\\ \Rightarrow & 1-\frac{r}{n} &\geq& 1-\frac{1}{h} \end{array}$

1. $$Ab = \frac{1}{n}\sum\limits_{i=1}^{n}|x_i-\bar x|$$
$$\Rightarrow n \cdot Ab = \sum\limits_{i=1}^{n}|x_i-\bar x|$$

Now $$\frac{r}{n}$$ is the fraction of numbers outside the interval. So $$1-\frac{r}{n}$$ is the fraction of numbers within $$h$$ absolute deviations of the mean, or within the interval $$(\mu-h\cdot Ab , \mu+h\cdot Ab)$$.

### References

Brunette, John. 2003–2007b. “Chebychev’s Theorem for Absolute Deviation.” Lecture Notes.