Python

Why Does Pi Show up in the Normal Distribution?

Pinterest LinkedIn Tumblr

Understand how bell curves are formed and their counterintuitive relationship to the number Pi

While recently looking through an old stats textbook, I came across the familiar equation for the normal distribution:

The equation for the normal distribution.

Anyone that’s taken a statistics course in university has come across this equation. I had seen it many times myself, but looking at it fresh this time, two questions immediately came to mind:

  1. How exactly does this thing form a normal distribution?
  2. What is π doing in there?

The first question seemed simple enough to figure out: I would just have to trace back the history of the equation and put it together piece by piece. But the second question absolutely stumped me: what in the world does a bell curve have to do with a circle?

I read through all of the Math Stack Exchange solutions, searched around, and asked on Twitter, but never felt like any of the answers gave me the intuition I was looking for. They relied too heavily on analytical solutions, or when visual techniques were employed, the connections felt hand-wavy to me. After doing a bit of my own research, here’s my attempt at explaining the connection without resorting to any advanced math.

First, what exactly is a bell curve?

Before we get to the π part, it helps to gain some insight into how exactly a bell curve is formed. Let’s start with the exponential function, which you can see within the equation above. Here it is standing on its own:

The function e to the power of x.
A chart showing the exponential function going up and to the right.
Image by Author

If we square the value of x, it turns into something that looks kind of like a quadratic, but isn’t one. Instead, it’s a function that grows much faster than a quadratic, but has some similar properties such as being symmetric about its lowest point. Adding it to the plot above for comparison, you can see that they have the same value at x=0 and x=1:

The function e to the power of x squared.
A chart showing the exponential function and the exponential function raised to x squared.
Image by Author

Finally, let’s make the exponent negative, and like magic, we get the bell curve shown in red below:

The function e to the power of minus x squared.
A chart showing the exponential function, Euler’s number function raised to x-squared, and Euler’s number raised to negative x-squared.
Image by Author

This function, f(x) = e^{-x²}, is just one particular bell curve of an infinite number of possibilities. In general, you can raise to any quadratic you like. However, it is only when that quadratic is concave (that is, it “opens” downwards) that you get a bell curve. Above, that quadratic was -x², which does indeed open downwards.

For example, the equation f(x) = x² + x + 2 plotted in blue below is not concave, and when e is raised to it, you get the green curve, which is obviously not a bell curve:

A graph of a convex quadratic and e raised to the power of this quadratic, both opening upwards.
Image by Author

If we switch the equation to be f(x) = -2x² + 3x + 2, though, we get a concave function, and raised to that forms the bell curve shape:

A graph of a concave quadratic, and the bell curve that results from raising e to the power of this quadratic.
Image by Author

For this reason, the general equation of a equation of a bell curve is raised to a quadratic:

The function e to the power of a general quadratic.

To help constrain it to only concave quadratics, you can perform the following replacements:

Various terms that can be substituted into the general quadratic to produce the final form of the normal distribution equation.

After you substitute these in and rearrange, you’ll find that you get the following, which is almost exactly the equation we started with at the top, only with an a in front of it:

The final normal equation produced by performing he mentioned substituions and the original normal equation from the beginning of the article.

The is chosen in the equation on the right so that no matter what shape the bell curve takes, the area underneath it is always exactly 1. This is because for a statistical distribution, 1 is equivalent to 100% of the possible outcomes, and the area should always sum to that value.

So, in other words, the connection between the bell curve and that π term must have something to do with the area of the curve itself. But what exactly is that connection?

How Pi is related to the bell curve

Let’s take stock of what just happened there. We took a transcendental number, e, and raised it to the power of a quadratic. When we calculate the area under that curve, we get another transcendental number, π.

It turns out that these two numbers are related in a few ways, including their relationship in the complex number system via one of the most beautiful equations in math: e^{iπ} + 1 = 0. But that equation doesn’t play a role here.

Instead, as we will see, π comes out of the way that we have to go about calculating the area. In a roundabout way, we can get this area by working with the square of e^{-x²}, and then taking the square root. In other words:

The square root of the area of e to the power of minus x squared multiplied by itself

The reason we have to do this has to do with the calculus technique that we need to employ to get the area. There’s plenty of examples online that show how to do this, but I want to instead provide the visual intuition that these analytic solutions don’t necessarily convey.

Since the variable we use to calculate the area is arbitrary, we can just as easily represent the above equation as the following, where we replaced the second with a y:

The square root of the area of e to the power of minus x squared multiplied by itself with y as the variable instead of x.

You can now think of this as putting one of these bell curves on the x-axis and the other on the y-axis, and then getting all combinations of their heights and plotting it in 3 dimensions:

A 3d chart showing a bell curve along the x and y axes, forming a uniform hill along the z-axis.
Image by Author

To get the area of one of the curves, you just need to get the volume of the “hill” that forms, and then take the square root of that value. An analogy to this with fewer dimensions is knowing the area of a square, and then getting its side length by taking the square root.

Note: This trick will not work for all types of functions. If you try this with a quadratic (say, -x² + 9), you will not get the correct answer. The reason is that this only works for functions that are rotationally symmetric when they are squared. While the Gaussian is, you can see from a similar plot of the quadratic below that it is “boxy” and is not symmetric through rotation the way that the curve above is.

A plot of a “squared quadratic” shows that it is not rotationally symmetric. Image by Author.

But, how do we get the volume? One way would be to chunk up the hill into squares like above, and then get the height of each in the middle of the square. You could then calculate the volume of these square pillars as (Area of Each Square)(Height) and then add up all those smaller volumes. The smaller you make the squares, the better the approximation.

However, this hides where the π comes from. So instead, imagine that instead of using squares, we divide it up radially. In this diagram, we are looking down from the top and we see the contour lines of the hill:

A top down view of a 3d chart of a bell curve, showing a small chunk of area of a concentric circle.
Image by Author

Here, you divide up the hill into “slices” represented by the black dotted lines. Those slices are further divided into pieces as highlighted in blue. As above, you multiply the area of each of these blue pieces by the height of the hill at that point to get the volume:

r times delta r times height

In this case, though, you repeat this along the “slice” to get the volume of the entire slice, and then multiply that by the total number of slices to get the entire volume of the hill.

If you make the angle 𝜃 small enough so that it’s barely a sliver, then for all intents and purposes, you can multiply the volume of a slice by 2π radians, the number of radians in a circle.

If you actually do this math (again, the calculus is covered here for those that want to see it in action) you’ll find that each slice has an area of exactly 0.5. Multiplying that by 2π radians and you get a volume that exactly equals π.

So there you have it: Pi comes out of the fact that we find the volume by making radial slices, and then stitching them all together around a circle.

As it turns out, anything that is symmetric through rotation can be thought of as involving circles, and naturally, circles imply that π is lurking somewhere in the math.

While this isn’t a rigorous proof and I skipped over a lot of details (e.g. the jump to the 3D plot of the two bell curves doesn’t generally work for all functions, but it does for the ones we used) I hope that this gives readers an intuition for why π seems to show up out of nowhere in a curve that has seemingly little to do with it.

Original Source

Senior Data Scientist at Wealthsimple. Previously at Shopify. Writing on data science, bayesian statistics, maps and math. @Brideau on Twitter.