Generalized linear models (GLMs) are widely used in the “big data” revolution. They are based on linear regression but have proven to be quite adaptable and robust to changes in variable distribution. Importantly, they allow us to make predictions on how a variable of interest changes when causative variables are manipulated.
For several decades scientists from different fields have realized that many features of the natural and human world do not follow Gaussian distributions, ie, they don’t cluster neatly around a mean. On the contrary, quantities such as the magnitude of Earthquakes, the income of individuals, the number of facebook friends or the word frequency have “heavy tail” distributions. That means that while there are many instances of weak Earthquakes and many poor people, from time to time there are a few extremely devastating Earthquakes and billionaires. It is unclear how informative GLMs are for these phenomena. GLMs are very useful to understand the mean or median behavior of a distribution, but they tell us little or nothing about the tails.
We want to tackle this problem by understanding which human-based activities have heavy tails; assess the impact of these rare events; and modify existing empirical models to give us information about them.