Lukas Püttmann    About    Research    Blog

"Superforecasters", by Philip Tetlock and Dan Gardner

How good are people at forecasting political or economic events? Why are some people better than others?

Philip Tetlock and Dan Gardner have written “Superforecasting” based on a tournament started in 2011 in which they have 2800 people predict a number of events. They then scored how they did and analyze the results.

Tetlock is famous for his 2005 book “Expert Political Judgment” in which he summarizes a 20 year study in which pundits, researchers, analysts and political experts forecasted events. He finds overall disappointing forecasting performance, but is able to draw a clear line between “foxes” (who are good forecasters) and “hedgehogs” (who are not). For this metaphor, he draws on an essay by Isaiah Berlin with reference to the ancient idea of: “The Fox knows many things, but the hedgehog knows one thing well.”

Hedgehogs are guided by the one thing they know – their ideology – and they form their forecasts to fit into their way of thinking. But foxes consider different possible explanations.

I was intrigued when I first read Tetlock’s 2005 book, because it seemed to play with the debate in economics on how “structural” vs. “reduced-form” our research should be. A structural model is backed by theory and tries to explain why things happen. A reduced-form model imposes less theory and tries to find patterns in data and predict what comes next, but it usually cannot explain why things happened.

Tetlock and Gardner’s new book does not resolve this conflict. They argue, again, that those people are good at prediction who produce good ballpark estimates (what they call “fermi-tizing”) and are carefully adjusting their probability estimates when new information becomes available. I liked this bit:

“Confidence and accuracy are positively correlated. But research shows we exaggerate the size of the correlation.” (p138)


“[…] granularity predicts accuracy […].” (p145)

They criticize after-the-fact explanations with: “Yeah, but any story would fit.” This is the basic criticism of structural models. Any number of models could fit your data points. How do we know which is right?

They say:

“Religion is not the only way to satisfy the yearning for meaning. Psychologists find that many atheists also see meaning in the significant events in their lives, and a majority of atheists said they believe in fate, defined as the view that “events happen for a reason and that there is an underlying order to life that determines how events turn out.” Meaning is a basic human need. As much research shows, the ability to find it is a marker of a healthy, resilient mind.” (p148)

In my opinion, the authors don’t take the necessity for models serious enough: We need models and we want them. And, actually, we will always have a model in our mind, even if we don’t make it explicit and admit it. Even Nate Silver (who is famous for his accuracy in prediction) says:

“For example, I highly prefer […] regression-based modeling to machine learning, where you can’t really explain anything. To me, the whole value is in the explanation.”

And in fact the authors become more humble near the end:

“In reality, accuracy is often only one of many goals. Sometimes it is irrelevant. […] ‘kto, kogo?’” (p245)

This last reference is Lenin saying: “Who did what to whom?”

They describe how good forecasters average the estimates they derive from different methods. For example, taking the outside view “how likely is it that negotiations with terrorists ever work?” and then the inside view “what are the motivations of the Nigerian government and what drives Boko Haram?”.

But that only works because Tetlock’s forecasts are quite specific. They’re relevant, yet they exclude a large number of things. Of the top of my head, here’s a list of what they didn’t forecast:

  • Long-run events: “What will the Gini coefficient in the United States be in 2060?”, “Will China have at least 80% of the number of airport carriers of the United States in 2100?”, “Will the growth rate of German real GDP per capita be above 1% per annum from 2020-2060?”, “How likely is it that there will be a virus that kills at least 10% of the global population within 3 years from 2020-2150?”
  • High-frequency events: “How should we trade this stock in the next 10 seconds?”
  • Predictions or classifications involving a large number of objects: “Can we identify clusters in these 3 million product descriptions?”, “Do these 10 million pictures show a cat?”

The first of these events might be the most relevant of all, but they are also the most difficult to form an expectation about. The questions are unanswerable if we don’t want to wait and if we did wait Tetlock’s superforecasters might well be good at forecasting them. So I have to grant them that.

The second kind (“high-frequency prediction”), I actually find the least relevant and interesting. I think here this would really just be number-crunching, pattern-matching, so “reduced-form” in its purest form and means writing or applying algorithms to do the work. Still, we don’t really learn anything about this kind of forecasting from Tetlock’s books, but it’s what a lot of people in finance think of when they hear “prediction”.

The third has recently become more relevant, but much more so in the realm of machine learning analysts and statisticians. They are the kind of problems one might find on kaggle. Again, they’re prediction but Tetlock’s recipes don’t work here.

I like the idea of ballpark estimates and “fermitization”, but something there irritated me. Isn’t their whole point about taking all information into account and not sticking with narratives, but to instead make careful probabilistic estimates? Tetlock and Gardner discuss the example of how many piano-tuners there are in Chicago. They then go through a textbook example of how to answer a consulting interview question. They come up with an estimate of 62 and present a highly dubious empirical estimate of 80. A number of things strike me as odd: First, they next go on to say how the empirical frequencies of events should be our baseline. So shouldn’t we first have googled for it and seen, “ok, there seem to be about 80 of these guys in Chicago”. Then, in the next step, we could think about where our estimate might have gone wrong. Maybe not everybody of them has a website? Maybe we didn’t find all? Maybe there’s double counting? And then we could adjust for that. Or, you could do both, their Fermi “structural” estimate and the Googling “reduced-form” estimate and then average both using weights that depending on how relatively certain you are.

Their iterated statement that we need to measure what we are talking about, reminds me of Thomas Piketty’s, Abhijit Banerjee and Esther Duflo’s and Angus Deaton’s books who also spend large portions of their texts arguing that we need to have good data about the things we care about. I completely agree.

I also liked their discussion on how all human beings need narratives and how that might even be good for our mental health and resilience. And I do suppose I would be miserable as a superforecaster. I already devour large amounts of news, blogs and more every day, but I dread getting updates by Google News about all the topics that I would have to cover. In fact, I did consider taking part in Tetlock’s superforecasting experiment. Back in 2011, it went through the blogs and I came across it. I looked at it a bit and I thought I might enjoy it, but really I didn’t want to commit so much time to something like that. With hindsight, I’m glad I didn’t participate.

He also discusses Keynes’ citation:

“When the facts change, I change my mind. What do you do, Sir?”

This sounds like a really foxish, Bayesian statement. I recently came across the assessment by Marc Blaug (in the introduction to his book) that Keynes was initially a Fox and became a Hedgehog. Tetlock then presents the nice twist that it’s unknown if Keynes really stated that. But he was ready to admit it, because it wasn’t fundamental to his (Tetlock’s) identity.

I also like the idea of a “pre-mortem” (p202), so thinking about reasons that my project might fail. (But as for research projects, maybe it’s better to actively resist this, otherwise you never get going.)

He ends with a plea for opposing parties to get together and use their different view to come up with a better forecast:

“The key is precision.” (p268)

The problem here is that we are talking about conditional vs. unconditional forecasts. Different groups want to change that condition. Also, some forecasts are political – such as those concerning GDP or population size – where the forecast itself might even have an impact on what will happen.

Last, I also agree with Tetlocks thank you note:

“[…] I thank my coauthor […] and editor […] who helped me tell my story far better than I could have – and who had to wrestle for two years against my professional propensity to “complexify” fundamentally simple points.” (p288)

When you compare Tetlocks two books on this topic, this last is much more pleasant to read without loosing in accuracy or depth.