1
$\begingroup$

I'm reading the section on high dimensional geometry in this book. The author presents the law of large numbers like this, around page 14. Given a random variable $x$ with expected value $E(x)$, $$Prob\Big(\Big|\frac{x_1 + \ldots + x_n}{n} - E(x)\Big| \geq \epsilon\Big) \leq \frac{Var(x)}{n \epsilon^2}$$

As an application, they write "Let z be a $d$ dimensional random point whose coordinates are chosen from a zero mean, $\frac{1}{2\pi}$ variance Gaussian... By the law of large numbers, the square of the distance of z to the origin will be $\theta(d)$ with high probability. In particular, this means there is a vanishingly small probability that such a random point z will lie in the unit ball..."

I cannot understand this statement about "the square of the distance of z to the origin will be $\theta(d)$ with high probability". I've tried fiddling with the expression of the law of large numbers and making substitutions using algebraic rules for variance and expectation with no luck. Can you explain this calculation with gory detail?

$\endgroup$
3
  • $\begingroup$ Coincidence of the year : I sat in (one of the authors) Kannan's class a few years ago, and his first example in his first class of high dimensional probability was this one! Anyway, more than gory detail you should be looking for intuition with his first examples. $\endgroup$ Commented Jan 25, 2021 at 17:53
  • $\begingroup$ What a coincidence! I'm extremely new to probability. I didn't even know what variance or expectation were before reading this chapter. I want to see the gory detail so I can build intuition for how to work with these kinds of expressions. $\endgroup$
    – Square
    Commented Jan 25, 2021 at 18:36
  • $\begingroup$ For context : when I took his course, I had ZERO background knowledge in probability, just like you. This example baffled me as well, but the gory detail is better left as a footnote, the calculations are very messy and lead away from the main point. I will try to write an answer emphasizing this, however I wish to spare you (I use those words for a reason) the details. May I do so? (Side note: I won't spare ALL the detail, but only that which I think is beyond a newbie in probability). $\endgroup$ Commented Jan 25, 2021 at 18:39

1 Answer 1

2
$\begingroup$

So the idea of the central limit theorem is this : samples concentrate around the mean.

What does that mean? Well, it means that if I had a distribution and wished to find its mean (for example, I had a coin and I wanted to find the probability with which it falls heads) , then I should collect many samples from the distribution (by flipping that coin many times and noting if I got a heads or tails) and take their average (number of coin flips which were heads, divided by total number of flips). According to the Central Limit Theorem, this number is , with high probability close to the mean.


Now, Kannan asks you to consider the following random variable : let $Z_1,Z_2,...$ be independent normal random variables with mean zero, variance $\frac 1{2 \pi}$. Then the random variable under consideration is $Z = Z_1^2 + Z_2^2 + ... + Z_d^2$. What is this? Well, if you put together the numbers $(Z_1,...,Z_d)$ as a tuple, you can think of it as a point in $\mathbb R^d$, or $d$-dimensional space. The quantity $Z$ is the squared distance of the random point $(Z_1,...,Z_d)$ from the origin $\mathbf{0} = (0,0,...,0)$.

Now the central limit theorem applies for $Z$. That is, if I sample a random point by picking coordinates with that normal distribution and consider the squared distance of this point from the origin, it should be close to the mean of $Z$ with high probability.


Note : The mean, and the expected value of a random variable are the same thing , denoted by $E[Z]$.

The mean of $Z$ can be calculated using linearity of expectation : $E[Z] = E[Z_1^2] + E[Z_2^2] + ... + E[Z_d^2]$. So, we need to know what $E[Z_i^2]$ is.

The point is, each coordinate is a zero mean , $\frac 1{2 \pi}$ variance Gaussian. Now, for any random variable $X$, the quantity $E[X]$ is called the mean, and the quantity $E[X^2] - E[X]^2$ is called the variance, denoted by $Var[X]$.

In particular, $E[X^2] = E[X]^2 + Var[X]$, so we have $E[Z_i^2] = E[Z_i]^2 + Var[Z_i]$. However, both these quantities are known : $E[Z_i] = 0$ and $Var[Z_i] = \frac 1{2 \pi}$ because the coordinates are given to have that mean and that variance.

Therefore, $E[Z_i^2] = \frac {1}{2 \pi}$ for each $i$ and thus $E[Z^2] = \frac{d}{2 \pi}$.

Note : I can understand that the main point of confusion may be in the above and below paragraph : if that is so, kindly inform me so I can elaborate.

Adding it all up, $E[Z] =\frac{d}{\sqrt{2 \pi}}$. This is a constant multiple of $d$, so the $\Theta$-notation explained here(check in and around the table for examples) tells you that $E[Z] = \Theta(d)$. Basically, it means that as $d$ grows, $E[Z]$ grows as a constant multiple of $d$, which is obviously true.

Now, by the central limit theorem, $Z$ will be $\Theta(d)$ with high probability, which explains all but the last line.


Now, the author says that the probability that such a point will lie in the unit ball is vanishingly small. But now we see why : a point in the unit ball has squared distance at most $1$. Now for example, in dimensions $1000$, the squared distance to the origin of a random point , by the central limit theorem, would lie close to the mean which is $E[Z] = \frac{1000}{\sqrt{2 \pi}} \approx 399$. So, with very low probability, the point can be less than $1$ squared distance from the origin. As $d$ goes up, the mean goes up with $d$, so the chances of this point being contained in the unit ball are vanishingly small.

$\endgroup$
5
  • $\begingroup$ This is an amazing answer. Thank you. However, I don't understand why $Z_i^2$ should have the 'same distribution and hence the same mean'. Are the mean and expected value interchangeable and that's why? Why should the expected value for the squared variable be the same as the original? Why is it the variance in this case? Is there a formula I'm missing? $\endgroup$
    – Square
    Commented Jan 25, 2021 at 20:51
  • 1
    $\begingroup$ I will explain that $\endgroup$ Commented Jan 26, 2021 at 3:42
  • 1
    $\begingroup$ @Square I have provided some details, I hope it is useful. Thank you for asking for clarification. $\endgroup$ Commented Jan 26, 2021 at 5:26
  • 1
    $\begingroup$ @Square If you are satisfied, kindly up vote the answer to show this, otherwise you can clarify further with me. You can also accept the answer if you feel it is complete. I don't see you accepting answers too often, please do this to ensure that questions are closed and do not remain in the unanswered queue. $\endgroup$ Commented Jan 26, 2021 at 9:37
  • 1
    $\begingroup$ @Square Thanks! Listen, that book you are holding is a gold mine : go through it slowly, read it at bed time, give any excuse you can to peruse it because it is the Bible of data science and probabilistic intuition. $\endgroup$ Commented Jan 26, 2021 at 17:22

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .