The Gaussian

Submitted by Leo on Sun, 04/20/2008 - 23:01

I did not go deep enough in my math studies as an undergrad to get a glimpse of calculus of variations (link to notes from cam math dept, chapter 8). It looks fascinating, and can be used to find functions that are minimal w.r.t some constraints.

For example, calculus of variations can be used to show why the Gaussian is interesting. It's a limiting distribution of several families, sum of IID variables is approximately Gaussian - but it is also a distribution that conveys our ignorance of the data. Given some unknown distribution has mean and variance , normal distribution is the one that conveys the least extra information, i.e, it has the maximum entropy among all possible .

The way to show it involves calculus of variations and Lagrange multipliers. We encode the 3 conditions of the distribution function (integrates to 1, mean and variance given), and combine it with the entropy in the Lagrangian:

Now differentiating with respect to (differentiating w.r.t a function! :), and finding the maximum, we get

, from where we instantly get the form of the Gaussian:

Completing the square and solving for the Lagrange multipliers using the constraints for mean and variance, we arrive at the Gaussian distribution. This holds similarly for the multivariate case.

So modelling the data as a Gaussian is equivalent to saying that we know nothing about the data except its mean and variance. Once again - cool :)