The Signal and the Noise: The Art and Science of Prediction was written by Nate Silver, a consultant-briefly-turned-poker-player-turned-political-analyst who is most famous for the election forecasting website fivethirtyeight.com. The Signal and the Noise is one of the small number of books – along with Philip Tetlock’s Superforecasting – that aim to seriously assess the question of how predictable the future is, and how people can systematically improve their prediction ability. This is a question which is of a lot of interest to me. Improving judgements about the future seems to be highly important in many areas (what will the effects of a policy be? When will different technological developments occur? How many people will die from COVID?) and very little attention is paid to it. I found The Signal and the Noise to be thoughtful, and I learned a lot from it.
A running theme of this book is that humans don’t have very good track records predicting the outcomes of complex systems. But one domain where humans have excelled is weather forecasting. Weather forecasts are amazingly accurate relative to the complexity involved. In the mid-70s, the US National Weather Service was off by about 6 degrees (Fahrenheit) when trying to forecast three days in advance. This isn’t much more accurate than what you get if you look at long-term averages – as in, what temperature is most likely in this region at this time of year, not taking into account any specific information. Now, the average miss is 3.5 degrees. This is actually slightly less of an improvement than I would have guessed, although to reduce the error in a forecast by a factor of two requires way more than twice as much effort, since errors can compound.
I was surprised to learn how large a role humans still play in weather forecasting. Having a human expert use their judgement in assessing many computer-generated forecasts is better than any of the forecasts are by themselves. Humans make precipitation forecasts 25% more accurate than computers alone and temperature forecasts 10% more accurate. Moreover, the accuracy added by humans has not significantly changed over time, so humans have been getting better at the same rate as the machines (!). If you’re wondering why the weather forecasts you use don’t feel very accurate, it’s in part because weather services are private companies that tend to exaggerate forecasts for appeal; you won’t see this inaccuracy in government forecasts. In particular, meteorologists are known to have a “wet bias” – they forecast rain more often than it actually occurs.
There have been some pretty tremendous positive social externalities of commercial weather forecasting, most notably in creating sophisticated early warning systems for extreme weather. The ability to predict typhoons in India and Bangladesh, for instance, has probably saved many thousands of lives. Silver has a few stories in here about people who refuse to leave their homes during an evacuation because of an unjust scepticism of the forecasts. There also appears to be an exposure effect egoing on: studies of hurricanes find that having survived a hurricane before makes you less likely to evacuate future ones.
The terms ‘fox’ and ‘hedgehog’ used in this book come from the Greek poet Archilochus, who wrote that “a fox knows many things, but a hedgehog knows one big thing”. Foxes are people who don’t have grand unified theories, who constantly revise their beliefs to account for new evidence, and live in uncertainty. Hedgehogs are partisans, and have overarching worldviews which they’ll contort the evidence to fit.
The legendary psychologist Philip Tetlock ran a forecasting tournament in which he tracked and graded the predictions of political experts including professors and government officials over nearly two decades and which he summarised in his book Expert Political Judgement. The main finding: experts are barely more accurate at prediction than chance, and usually perform worse than simple extrapolation algorithms (like “assume nothing will change”). There were too many hedgehogs and not enough foxes. The incentive for pundits and journalists is not to actually be accurate; it’s to appear reasonable while giving novel and entertaining predictions. Indeed, another of Tetlock’s major findings is that the more often an expert was on TV, the less accurate their predictions were.
Tetlock also found an overconfidence effect: when an expert says something has no chance of happening, it happens 15% of the time. When they said it is guaranteed to happen, it happens 75% of the time. While foxes get better at predicting with more information, hedgehogs get worse. If you have grand theories instead of partial explanations, having more facts can make your worldview even less accurate. Partisan differences in prediction were not seen in general (people were relatively unbiased in guessing how many seats republicans vs. democrats would win) but there were marked in specific cases (a left-leaning pundit is much more likely to say a specific democrat will win). These predictions were graded using a Brier Score.
(I wonder if this generalises? If we have some kind of broad philosophical or political worldview that biases us, we might actually see more bias the more we zero in on specific cases. Hence, while talking about specifics and partial explanations is usually the better way to get at the truth, to be effective it might require some deconstructing of one’s prior beliefs.)
The woeful state of prediction might lead you to worry about climate science, where government policy is explicitly shaped by expert forecasts. Indeed, the magnitude of warming from climate change has been overestimated by scientists historically. The actual level of warming was below the 1990 IPCC estimates’ most optimistic projection. In response, the IPCC revised down its models in 1995, and now the observed outcomes fall well within the confidence interval of the projected outcomes (albeit the warming is still slightly less than predicted). You can certainly tell a story here about bias: scientists probably want to find a large warming effect and they think (correctly) that we’re at way more risk of panicking too little than too much. However, these estimates assumed a “business as usual” case; so, one factor that wasn’t addressed adequately was that Chinese industry caused an increase in sulphur dioxide concentration starting around 2000, and sulphur dioxide causes a cooling effect. People forget about the other factors that contribute to warming – I was unaware that water vapour is actually the factor that contributes the most to the greenhouse effect! This all seems complicated to take into consideration so the less-than-stellar prediction performance of climate scientists can probably be forgiven. They also seem to have humility: just 19% of climate scientists think that climate science can do a good job of modelling sea-level rise 50 years from now, for instance. At least as of when this book was published (2012), the effect of climate change on most extreme weather events also appears to be unclear. This is a level of uncertainty that the media definitely fails to communicate.
Notably, the estimates around climate change are spectacularly noisy, which is well-known, but I think I had failed to appreciate just how noisy they are. Over the last 100 years, temperature has declined in one quarter of decades – for instance, global temperatures fell from 2001 to 2011.
Another thing people seem to forget is for how long we’ve known about the greenhouse effect. It was discovered by Fourier (of Fourier transform fame) in the 1880s, and Arrhenius in 1897 was the first to predict that industrial activity would lead to a warming effect.
The economist John Kenneth Galbraith famously said that “the only function of economic forecasting it to make astrology look respectable.” Indeed, at least in terms of asset pricing, we shouldn’t expect economics to be of any help at all because of the efficient market hypothesis (EMH). This says that stocks and other financial products are priced in such a way that encapsulates the sum total of the information available to the market, such that individual trader advantage is rare. There are two components to EMH, which Richard Thaler sometimes calls the No Free Lunch assumption and the Price is Right assumption. No Free Lunch, or, colourfully, the Groucho Marx theorem, says that you shouldn’t be willing to buy a stock from anyone willing to sell it to you; in other words, it’s difficult if not impossible to consistently beat the market. The Price is Right says that assets are efficiently priced in a way that encapsulates all information.
Thaler has made a career out of exposing the extent to which economic models do not take sufficient account of human irrationality, and he is the ideological arch-nemesis of Eugene Fama, the father of EMH (they’re also golfing partners, which I think is cute). Thaler has a famous paper in which he looks at the company 3Com, which created a separate stock offering for its subsidiary Palm. There was a scheme whereby 3Com stockholders were guaranteed to receive three shares in Palm for every two shares in 3Com that they held, which implied that it was mathematically impossible for Palm stock to trade at more than two thirds of the value of 3Com stock. Yet, for several months, Palm actually traded higher than 3Com, through a combination of hype and transaction costs.
The final point that Silver makes about EMH is that it’s in this fascinating epistemic state where if people actually believed it was true, it would stop being true. The only reason people trade stocks is because they think that they have better judgement than the market (if you invest in a portfolio that tracks the market average you will outperform 50% of traders by definition). This mirrors a lot of what people say about startups: if people actually believed that almost every possible great company idea has already been taken, then they wouldn’t start so many companies, undermining the process that made the original statement close to true.
Why does Silver talk about a theory of asset pricing so much? Because it’s epistemically important to forecasting. If there’s an efficient market for ways to improve the world, then if something were a good idea, someone would already be doing it. If there was an efficient market for ideas, every good idea would already have been tried and rise to the level of scientific consensus. And yet science is subject to massive systemic flaws, and huge opportunities for improving the world remain untapped because of inertia and apathy. Improving our forecasts of the future is important. It seems like a lot of people stand to make a lot of money from doing this. It seems like a small community mostly consisting of nerds on the internet would not be able to massively advance this field. But this impression is wrong.
Silver points out that if you look at the predictions of the Blue Chip Economic Survey and The Survey of Professional Forecasters, the former has some forecasters which do consistently better than others over the long run, but the latter doesn’t. The reason why is that Blue Chip isn’t anonymous, and so forecasters have an incentive to make bold claims that would garner them a lot of esteem if they turned out to be true. One study found a “rational bias” – the lesser the reputation of the institution that someone was forecasting from, the more bold they were in the claims they made. While considerations of esteem probably worsen forecasts overall, they lead some individuals to consistently outperform the crowd.
All of this should help us to understand bubbles. If EMH is true, how could outside observers notice massive market inefficiencies? Robert Shiller pointed out how the price-earnings ratio (share price divided by earnings per share) during the dot-com boom was unreasonably high, which was the sort of thing that had previously preceded a crash. One of the reasons why the bubble did not sort itself out despite people like Shiller pointing this out is the career incentives of traders: if you bet against the market and the market doesn’t crash, you look like an idiot, while going along with the herd won’t result in exceptionally bad personal outcomes. Silver says there is significant evidence that such herding behaviour exists.
Given all this volatility, it shocked me to learn that, over the long run, house prices in the US were remarkably stable until recently. In inflation-adjusted terms, $10,000 invested in a home in 1896 would be worth just $10,600 in 1996 (as measured by the Case-Schiller index). The value of such an investment would then almost double between 1996 and 2006!
There are a lot of interesting applications of the lessons from the science of prediction. One of the most exciting to me is predicting what research is going to replicate. One of the key lessons we should take from The Signal and the Noise is that academics, like everyone else, have all sorts of motivations, including prestige. Through honest motivations, scientists might go along with results that conform to their expectations and worldview, but that a financial market wouldn’t price as being likely to actually be true. While markets have problems (see above), they’re a vast improvement over hearsay and surveys. A ‘prediction market’ works because it actually incentivises people for accuracy in a way they almost never are in other domains. It also works in part because of the wisdom of crowds: group aggregations of forecasts outperform individual ones by 15-20% on average.
Many of you will know this story: John Ionaddis publishes a paper with the provocative title Why Most Published Research Findings Are False which argues that due to the high number of researcher degrees of freedom, and the large variety of results that can be demonstrated with sophisticated statistics, most published research is probably wrong. More than a decade later, he seems to have been proven right. Bayer Labs found that more than two thirds of psychology research papers failed to replicate. Hence, the possible gain from a prediction market in study replication is large. One such project is Replication Markets.
Scott Alexander criticises how people sometimes use the low total death tolls from terrorism as a way to mock conservatives, or people who are concerned about terrorism in general. Most years, lightning kills more people in the US than terrorism, so why worry? Well, here’s a graph of the number of people that atomic bombs have killed since WW2 compared to the number of people who die by lightning each year. Would this be a convincing argument for not worrying about nuclear war? The tail risks are the whole goddamn point.
If you’ve read The Black Swan, you’ll know that lots of things are like this, with ‘heavy-tailed’ risk, and that we sometimes try to shoehorn these into normal distributions.
Earthquakes are distributed according to one such heavy-tailed distribution – a power law – whereby for every one point increase on the Richter scale, an earthquake is ten times less likely. So the bulk of the devastation comes from just a few earthquakes. The Chilean earthquake of 1960, the Alaskan earthquake of 1964, and the Great Sumatra Earthquake of 2004 accounted for half of all energy released by all earthquakes in the world over the entire 20th century! What else is less like height and more like earthquakes?
In one of the book’s middle chapters, Silver uses terminology about infectious disease that many of us have become familiar with over the last couple of months, particularly SIR models. One interesting thing he talked about was the failure of SIR models to account for how there wasn’t a re-emergence of HIV in the early 2000s among active gay communities like that in San Francisco (there was an increase in unprotected sex and other STDs). It’s actually still somewhat a matter of debate why this happened but probably it was because people began to “serosort” – namely, choose partners who had the same HIV status as them. This goes against one of the major assumptions of the SIR model, which is that interactions among individuals are random.
The next few pages blew my mind the most out of anything I had read in a while. I can’t believe I hadn’t heard of President Ford’s 1976 campaign to vaccinate 200 million people against a suspected H1N1 pandemic. The vaccine dramatically increased the rates of the nerve-damaging Guillain-Barré syndrome, and the public turned against it, such that only 50% of people were willing to be vaccinated! The severity of the outbreak also turned out to have been exaggerated, so the government gave up after 25% of people were immunised. How have I not seen this being brought up in the context of COVID?
I recommend this book, particularly if you’re not already familiar with Philip Tetlock, forecasting, and Bayesian statistics. For people who are already interested in that kind of thing, I can still recommend skimming. I’m sure I’ll write about forecasting again on this blog at some point – I didn’t even have time to talk about superforecasters, the people who can consistently outperform expert predictions. ★★★★☆
Afterword: Philosophical Pondering on the Problem of Prediction
One question that Tetlock sometimes gets asked about is whether it’s nonsensical to ascribe a probability to an event that only occurs once. If you think the universe is deterministic, you might say that the probability of a certain candidate winning an election is either 0% or 100%, but you simply do not know which. So, in what sense can this be evaluated probabilistically? Does probability represent something metaphysical about what the outcomes would be if the trials were run infinitely many times? Or just a degree of belief? The former view is identified with the frequentist school and the latter the Bayesian school. Regardless of one’s philosophical position, Tetlock’s approach is to just get on with it: if we look at the set of all supposedly unpredictable things, do the things you predicted would happen 10% of the time happen 10% of the time?
Viewing probability as just degree of belief is actually very counterintuitive. There are problems with this that I still haven’t figured out, like the distinction between external and internal ‘credences’, of degrees of belief. I may think there is a 50% chance that Trump will win re-election, but isn’t there some higher-order uncertainty I have about whether I’m using the correct mental model to assess this, or whether I’m actually in a computer simulation, or something? But doesn’t this eliminate the initial theoretical appeal of having all considerations cash out into a single credence? What if your credences hold some mathematically impossible property?
David Hume thought that, because we don’t have certainty, saying that the sun will rise tomorrow is inherently not any more rational than saying it won’t. More recently, probability-as-beliefs was famously opposed by the statistician Ronald Fisher. One of the main problems with the frequentist school is how much of forecasting and probability turns into a game of finding the correct reference class, or relevant comparison group. The reference class for the die you roll is fair six-sided dice, but what reference class would the 9/11 attacks be in, for instance? So the principal objection to the Bayesian approach – that it is too subjective – applies also to views of probability-as-frequency.