Prediction Revisited. Mark P. Kritzman
Читать онлайн книгу.The average fit across a set of prediction tasks, weighted by the informativeness of each prediction circumstance. For a full sample of observations, it may be computed as the average alignment of pairwise relevance and outcomes and is equivalent to the classical R-squared statistic.
Complexity: The presence of nonlinearities or other conditional features that undermine the efficacy of linear prediction models. The conventional approach for addressing complexity is to apply machine learning algorithms, but one must counter the tendency of these algorithms to overfit the data. In addition, it can be difficult to interpret the inner workings of machine learning models. A simpler and more transparent approach to complexity is to filter observations by relevance. The two approaches can also be combined.
Preface
The path that led us to write this book began in 1999. We wanted to build an investment portfolio that would perform well across a wide range of market environments. We quickly came to the view that we needed more reliable estimates of volatilities and correlations—the inputs that determine portfolio risk—than the estimates given by the conventional method of extrapolating historical values. Our thought back then was to measure these statistics from a subset of the most unusual periods in history. We reasoned that unusual observations were likely to be associated with material events and would therefore be more informative than common observations, which probably reflected useless noise. We had not yet heard of the Mahalanobis distance, nor were we aware of Claude Shannon's information theory. Nonetheless, as we worked on our task, we derived the same formula Mahalanobis originated to analyze human skulls in India more than 60 years earlier.
As we extended our research to a broader set of problems, we developed a deep appreciation of the versatility of the Mahalanobis distance. In a single number, his distance measure tells us how dissimilar two items are from each other, accounting not only for the size and alignment of their many features, but also the typical variation and covariation of those features across a broader sample. We applied the method first to compare periods in time, each characterized by its economic circumstances or the returns of financial assets, and this led to other uses. We were impressed by the method's potential to tackle familiar problems in new ways, often leading to new paths of understanding. This eventually led to our own discovery that the prediction from a linear regression equation can be equivalently expressed as a weighted average of the values of past outcomes, in which the weights are the sum of two Mahalanobis distances: one that measures unusualness and the other similarity. Although we understood intuitively why unusual observations are more informative than common ones, it was not until we connected our research to information theory that we fully appreciated the nuances of the inverse relationship of information and probability.
Our focus on observations led us to the insight that we can just as well analyze data samples as collections of pairs rather than distributions of observations around their average. This insight enabled us to view variance, correlation, and R-squared through a new lens, which shed light on statistical notions that are commonly accepted but not so well understood. It clarified, for example, why we must divide by N – 1 instead of N to compute a sample variance. It gave us more insight into the bias of R-squared and suggested a new way to address this bias. And it showed why we square distances in so many statistical calculations. (It is not merely because unsquared deviations from the mean sum to zero.)
But our purpose goes beyond illuminating vague notions of statistics, although we hope that we do this to some extent. Our larger mission is to enable researchers to deploy data more effectively in their prediction models. It is this quest that led us down a different path from the one selected by the founders of classical statistics. Their purpose was to understand the movement of heavenly bodies or games of chance, which obey relatively simple laws of nature. Today's most pressing challenges deal with esoteric social phenomena, which obey a different and more complex set of rules.
The emergent approach for dealing with this complexity is the field of machine learning, but more powerful algorithms introduce complexities of their own. By reorienting data-driven prediction to focus on observation, we offer a more transparent and intuitive approach to complexity. We propose a simple framework for identifying asymmetries in data and weighting the data accordingly. In some cases, traditional linear regression analysis gives sufficient guidance about the future. In other cases, only sophisticated machine learning algorithms offer any hope of dealing with a system's complexity. However, in many instances the methods described in this book offer the ideal blend of transparency and sophistication for deploying data to guide us into the future.
We should acknowledge upfront that our approach to statistics and prediction is unconventional. Though we are versed, to some degree, in classical statistics and have a deep appreciation for the insights gifted to us by a long line of scholars, we have found it instructive and pragmatic to reconsider the principles of statistics from a fresh perspective—one that is motivated by the challenge we face as financial researchers and by our quest for intuition. But mostly we are motivated by a stubborn refusal to stop asking the question: Why?
Practitioners have difficult problems to solve and often too little time. Those on the front lines may struggle to absorb everything that technical training has to offer. And there are bound to be many useful ideas, often published in academic articles and books, that are widely available yet seldom used, perhaps because they are new, complex, or just hard to find.
Most of the ideas we present in this book are new to us, meaning that we have never encountered them in school courses or publications. Nor are we aware of their application in practice, even though investors clearly thrive on the quality of their predictions. But we are not so much concerned with precedence as we are with gaining and sharing a better understanding of the process of data-driven prediction. We would, therefore, be pleased to learn of others who have already come to the insights we present in this book, especially if they have advanced them further than we do in this book.
1 Introduction
We rely on experience to shape our view of the unknown, with the notable exception of religion. But for most practical purposes we lean on experience to guide us through an uncertain world. We process experiences both naturally and statistically; however, the way we naturally process experiences often diverges from the methods that classical statistics prescribes. Our purpose in writing this book is to reorient common statistical thinking to accord with our natural instincts.
Let us first consider how we naturally process experience. We record experiences as narratives, and we store these narratives in our memory or in written form. Then when we are called upon to decide under uncertainty, we recall past experiences that resemble present circumstances, and we predict that what will happen now will be like what happened following similar past experiences. Moreover, we instinctively focus more on past experiences that were exceptional rather than ordinary because they reside more prominently in our memory.
Now, consider how classical statistics advises us to process experience. It tells us to record experiences not as narratives, but as data. It suggests that we form decisions from as many observations as we can assemble or from a subset of recent observations, rather than focus on observations that are like current circumstances. And it advises us to view unusual observations with skepticism. To summarize:
Natural Process
Records experiences as narratives.
Focuses on experiences that are like current circumstances.
Focuses on experiences that are unusual.
Classical Statistics
Record experiences as data.
Include observations irrespective of their similarity to current circumstances.
Treat unusual observations with skepticism.
The advantage of the natural process is that it is intuitive