Gaussian Distributions vs Power Laws: Your Ultimate Guide to Making Sense of Natural and Social Phenomena

1. The World According to Friedrich Gauss

We seem heavily inclined to apply Gaussian (or normal) distributions when making sense of various phenomena around us. But Gaussian laws are not the only statistical models that exist, so why do we freely favour them?

A conjectured reason might be their intuitiveness and reassurance; normal distributions seem to convey a sense of stability and predictability that helps us cope with future uncertainty. Let’s use a classic example to illustrate the idea.

Imagine we wish to examine the properties of a random variable, say the average height of individuals in a group. We observe the following:

  • The data points are centred around an average value (or mean)
  • It is equally likely to find people on either side of the average (the distribution is symmetrical around the mean)
  • Finding data points very far from the mean is extremely unlikely (close to zero).
  • By observing a (relatively) small sample of the population, we can safely infer the properties (average, standard deviation, and moments) of random variables.

None of the above applies to power laws, which makes phenomena governed by those laws hard to work with (like predicting their mean and variance), at least in the traditional ways.

Gaussian distributions accurately resemble the reality when the variable of interest and its properties (mean, variance, etc.) are examined in a large collection of independent processes where variations cancel out over the long run. As soon as agent interactions become non-trivial, processes become interdependent, complex behaviour emerges, normal distributions subside, and power laws prevail.

Phenomena governed by normal distributions are safe. The future resembles the past and can be reasonably predicted from historical data (provided that the randomness-generating mechanisms are static or slow-moving). However, the phenomena of interest to us, such as the behaviour of social networks like markets, economies, or communities, are complex. The same applies to biological (and most natural) systems.

Most natural and social phenomena exhibit complex behaviour and are governed by power laws; therefore, understanding power laws is vital to making sense of and acting in those environments.

So, what are power laws and power law dynamics, and how do they differ from Gaussian?

2. Gaussian (Normal) Distributions

The Gaussian distribution, also known as the normal distribution, is a different mathematical model used to describe data where values cluster around a central mean value with a symmetrical bell-shaped curve. In a Gaussian distribution, most data points are close to the mean, and as you move further away from the mean in either direction, the number of data points decreases.

This distribution is commonly encountered in natural phenomena like the heights of people, scores on standardized tests, and measurement errors. Mathematically, it is represented as:

f(x) = \dfrac {1} {\sigma\sqrt{2\pi}} e^{- \dfrac{ (x-\mu)^2 }{2 \sigma^2}}

Where:

  • f(x) is the probability density function.
  • μ is the mean.
  • σ is the standard deviation.

The Gaussian distribution has the familiar bell curve that we see below.

A normal distribution with zero mean and one standard deviation, N(0, 1).
A normal distribution with zero mean and one standard deviation, N(0, 1).

The figure above shows a normal zero mean and unit standard deviation distribution. The following can be inferred:

  • The distribution is symmetrical around the mean and depends on the value of x
  • The probability of observing a value above 1 (1-σ event) is 0.1587
  • The probability of observing a value above 3 (3-σ event) is 0.0013
  • The probability of observing a value above 6 (6-σ event) is 0

3. Power Law Distribution

A power law distribution is a mathematical model describing phenomena where a few events have high frequencies while many events have low ones. In a power law distribution, the probability of an event occurring is inversely proportional to its magnitude.

Mathematically, a power law distribution can be represented as:

P(X \geq x) \propto x^{-\alpha^2}

Where:

  • P(X≥x) is the probability that a random variable X is greater than or equal to x.
  • α is a parameter that determines the steepness of the power law curve.
The three graphs above show an inverse power law distribution with respective values of 0.5, 1 and 1.5
The three graphs above show an inverse power law distribution with respective values of 0.5, 1 and 1.5

Power laws have special properties, which we shall explore later. For now, let’s focus closely on how P(X > k) changes with k. Let’s assume that P(X > k) = 1/n. The following is true:

  • P(X > 2k) = 1/2n
  • P(X > 4k) = 1/4n
  • Etc.

What is interesting to observe here is that, while in the Gaussian, the ratio of P(X > 2k) / P(X > k) = is constantly and rapidly decreasing, for power laws, it is constant! In our example above, this means that P(X > 2k) / P(X > k) = P(X > 4k) / P(X > 2k) = … = 0.5. Aside from being constant, this ratio also does not depend on k.

Both of these properties make the implications of power laws fascinating, as we will endeavour to show in the next sections.

4. Examples of Power Laws

4.1 A Historical Overview

  • Pareto Distribution (1890s): Vilfredo Pareto, an Italian economist, is credited with discovering the Pareto distribution, which is a specific type of power law distribution. Pareto observed that a small percentage of the population in Italy owned a large percentage of the wealth. This led to the “80/20 rule,” where approximately 80% of the effects come from 20% of the causes. This distribution is commonly encountered in economics and wealth distribution.
  • Zipf’s Law (1940s): The law formulated by linguist George Zipf describes the distribution of word frequencies in written and spoken language. It states that the frequency of a word is inversely proportional to its rank in the frequency table. For example, the most common word in English (e.g., “the”) occurs much more frequently than the second most common word (e.g., “of”). Zipf’s law was one of the earliest documented power laws.
  • Complex Systems and Self-Organized Criticality (20th Century): The study of complex systems in the 20th century, especially in physics and the earth sciences, revealed the prevalence of power law behaviour. Physicist Per Bak and colleagues introduced the concept of self-organized criticality in the 1980s to explain phenomena like sandpile avalanches and earthquakes. These systems exhibited power law distributions of event sizes.
  • Renewed Interest and Applications (Late 20th Century – Present): Power laws gained renewed interest in various scientific disciplines, including physics, biology, social sciences, and network theory. Researchers recognized that power laws could be found in diverse natural and man-made systems, such as the distribution of city populations, the sizes of earthquakes, the connectivity of the internet, and the distribution of wealth.

Today, power laws continue to be a subject of study and fascination in many fields. They provide insights into the underlying principles governing the distribution of phenomena in complex systems, and their applications range from linguistics and economics to physics and network theory. The study of power laws has also expanded to address more complex distributions beyond simple scaling laws, such as fat-tailed distributions and multifractals.

4.2. Growth of Cities (Urbanization)

Power laws provide a framework for understanding the population distribution in cities. One of the most famous applications of power laws in this context is Zipf’s law.

Zipf’s law states that a city’s population is inversely proportional to its rank in the population hierarchy in many large cities. In simpler terms, the second-largest city will have approximately half the population of the largest city, the third-largest city will have approximately one-third, and so on.

Mathematically, Zipf’s law can be expressed as:

P(r) \propto \dfrac{1}{r^s}

Where:

  • P(r) is the population of the city ranked r.
  • s is an exponent that characterizes the distribution.

Zipf’s law suggests that a few cities will have the highest populations while most cities will have smaller populations. This concept helps explain the unequal distribution of urban populations and has implications for urban planning, resource allocation, and infrastructure development.

4.3 Sandpile Crashes (Self-Organized Criticality)

Sandpile crashes, also known as self-organized criticality, are observed in systems where particles are added one by one until they reach a critical state and trigger a cascade of events, often leading to avalanches. This concept is a manifestation of power law behaviour.

In sandpile models, grains of sand are added to a pile, and when the pile reaches a certain height or angle, it becomes unstable and collapses. Interestingly, the size and frequency of these collapses follow a power law distribution. This means that small collapses occur frequently, medium-sized ones less often, and very large collapses (rare but significant) also occur.

Mathematically, the distribution of avalanche sizes can be described by a power law:

P(s) \propto s^{-\tau}

Where:

  • P(s) is the probability of an avalanche of size s.
  • Ï„ is an exponent that characterizes the distribution.

Sandpile models illustrate how complex systems can exhibit self-organized criticality, leading to power law behaviour.

5. Scale Invariance and Power Laws

5.1 The Pareto Principle Revisited

Power laws are a manifestation of scale invariance, which means that a system’s behaviour remains similar or self-similar regardless of the scale at which you observe it. This property is often found in natural systems due to many processes’ hierarchical and self-organizing nature.

To understand scale invariance in power laws, consider the below diagram. What it’s telling us is that if we take any partition at any level and closely inspect its properties, we would find that it has the same structure as its parents and children (up to the smallest scale).

A demonstration of scale invariance in power laws.
A demonstration of scale invariance in power laws.

For example, Vilfredo Pareto found that the land distribution of the people of Italy followed a simple law (later known as the Pareto Principles): 80% of the land was owned by 20% of the people. Scale invariance tells us that, of the 20% bucket, we would find that 80% of the lands would be owned by 20% of the people in the category.

5.3 Why Big Cities Get Bigger — Explaining Power Law Dynamics

Why do big cities get bigger, rich people richer, and popular books more popular? Let’s try to answer that with an example.

  • Imagine a small town that starts with ten grocery stores of equal size. Let’s call them stores 1-10.
  • After a few years, one or two lucky store owners (say 1 and 2) will achieve a slight edge over the rest, allowing them to expand their brand offerings, open new branches, or even acquire one of the other small stores.
  • Having doubled their size, stores 1 and 2 are now even more affluent and can afford to expand their businesses even further, attracting clients that have historically shopped at stores 3-10.
  • In the long run, stores 1 and 2 (roughly 20%) will have captured most of the business in town (say 80%).

Power law dynamics explain the accumulation of power, wealth, and status in societies, economies, and ecologies. They also explain the 80/20 rule.

The 80/20 rule (sometimes also called the “Pareto Principle” or “Law of the Vital Few”) is quite famous, stating that a minority of employees generate most of the business value, a minority of people control the majority of wealth, few large cities exist alongside many small ones, and the list goes on.

6. Final Words

Statistics is mainly concerned with understanding the properties of a population from a random sample drawn from that population. This includes estimating the mean, variance, and higher moments and fitting a distribution law to the sample data. These properties allow us to understand the population and estimate event probabilities.

Below is a list of where things can go wrong with power laws.

  • Mean (Expectation): A power law distribution’s mean (μ) may not exist if the exponent α is less than 2. In this case, the distribution has an infinite mean. If α is greater than or equal to 2, the mean exists and is given by:

\mu = k \dfrac{\alpha - 1}{\alpha - 2}

  • Variance: A power law distribution’s variance (σ²) may not exist if α is less than 3. In such cases, the distribution has an infinite variance. If α is greater than or equal to 3, the variance exists and is given by:

\sigma^2 = k^2 \dfrac{\alpha - 1}{(\alpha - 3)(\alpha - 2)^2}

Deriving statistical inferences from power law distributions can be more challenging than doing so from Gaussian (normal) distributions for several reasons:

  • Finite Moments: Gaussian distributions have finite mean and variance, which makes them amenable to classical statistical techniques. In contrast, power law distributions may have infinite moments for certain parameter values, such as a mean and variance that don’t exist when the exponent α is less than critical values (2 for the mean and 3 for the variance). This lack of finite moments complicates standard statistical analysis.
  • Sample Size Sensitivity: Power law distributions are highly sensitive to sample size. Small samples may not accurately represent the true distribution, and power laws may appear as something else due to limited data. In contrast, Gaussian distributions are more forgiving of small sample sizes.
  • Noisy Data: Power laws often require larger datasets with less noise to identify accurately. Noise can lead to misinterpretation and make distinguishing a power law distribution from other heavy-tailed distributions or data with outliers challenging. Gaussian distributions are more robust to noise.
  • Parameter Estimation: Estimating the parameters of a power law distribution, namely the scaling constant (k) and the exponent (α), can be challenging. Standard methods, like maximum likelihood estimation, may produce biased estimates, and special techniques are needed for accurate estimation.
  • Statistical Tests: Common statistical tests and techniques are designed for Gaussian distributions. Adapting or developing new tests may be necessary when dealing with power law data. This task can be complex and may not yield as straightforward results as in Gaussian analysis.

In summary, mistaking power laws for Gaussians has two costly consequences: A) relying too heavily on past observations to predict the future, and B) underestimating the probability of thick tail events (or Black Swans). Both make us over-confident with the risks we choose to take.

7. References

Leave a Reply

Your email address will not be published. Required fields are marked *