Comparison of midterm predictions from FiveThirtyEight, PredictIt and Futuur

Much attention was given to electoral forecasts in the weeks leading to the 2018 US Midterms.

Forecasting experts and platforms varied not only when it came to methods and signals taken into account, but also to their accuracy once electoral results were in.

This article takes a closer look into three such forecasting services — Futuur, PredictIt and FiveThirtyEight — in order to compare their performance, explore the main differences between their methods, and reveal which factors might be driving the surprising success of the newcomer, Futuur.

Prediction Markets

Prediction markets aim to be more accurate than surveys and polls through the use of incentives to collect truthful data and realistic beliefs from people. These beliefs are revealed in the form of bets that are made by participants on one of the potential outcomes of an event, knowing that they will be rewarded if their bets are correct.

Forecasts from prediction markets can only be as good as the quality of the data they receive from users. It is essential, therefore, that the incentives offered by these markets are perceived by information holders as attractive enough for them to reveal their positions. In that regard, PredictIt and Futuur have considerably different approaches.

PredictIt
PredictIt uses real money when taking bets and distributing rewards, making financial gains the main driver for engagement. While the utility of a financial reward is evident, it can also pose barriers of entry to some participants, be it because the costs associated with participating might be steep, be it because such activity might be considered illegal gambling in some geographies. In the case of PredictIt, a non-profit project operated by Victoria University in New Zealand, a special dispensation from the CFTC was obtained, which allows them to operate legally in the US despite restrictions on gambling.

Prices to acquire shares of an outcome on PredictIt range from 0 to 1 dollar and are based on a continuous double auction amongst market participants. Ideally, the price in cents should be interpreted as the equivalent to the odds of a given outcome, although prices don’t always add up to 1 due to order book variations.

Futuur
Unlike PredictIt, Futuur uses a kind of play-money that represent a user’s social capital within the community of forecasters, making reputational gains the main driver for participation, rather than financial gains. Futuur contends that, with the right approach to gamification and usability, the attractiveness of reputational rewards can be as effective as real money in eliciting truthful information from participants. More importantly, by not relying solely on real money the platform can also overcome the barriers of entry commonly associated with real-money markets, making it easier for more people to participate.

When it comes to pricing shares of an outcome, rather than a continuous double auction, the platform uses an automated market maker based on the LMSR system designed by Robin Hanson. Prices always add up to 1.

Polling Aggregation

While prediction markets use the “wisdom of the crowd” to derive probabilities of future events, FiveThirtyEight takes a different approach, applying statistical analysis to historical data sets. For politics, this data is primarily from polls, but their model also takes into account “fundamentals” data like the state of the economy.

As with prediction markets, such estimates depend on the quality of the data that is brought to the system — as well as the diversity of input sources — to avoid or cancel out potential biases. In addition to aggregating information from polls, FiveThirtyEight applies weights to polls based on the reliability and historical bias in one direction or the other.

The website published forecasts based on its model for all 435 electoral races for the House, 35 races for the Senate, and 35 gubernatorial races, attributing odds that ranged from 0.01% to 99.9% to the victories of each of the running candidates. For the sake of the comparison made in this article, however, only the races that were also covered by the other services will be taken into account.

Comparative Performance

The following comparison is based on the accuracy rate of the forecasts, defined as the percentage of times in which the candidate with the highest estimated likelihood of being elected according to a platform was indeed the winner of the election. For the sake of clarity, the analysis has been broken down into three sections: Gubernatorial, Senate, and House races.

During the Midterms, Futuur made forecasts for 36 gubernatorial races, 27 Senate races, and 43 of the most competitive races for the House, and reached the following number in terms of accuracy:

Each of the platforms covered a somewhat different set of races. In order to provide a fair comparison, we’ve provided the results for all races that were featured on both Futuur and the other platform.

Here’s how each platform did when attributing odds to each of the winning candidates in the midterms, by 9 a.m. on election day. (Click here to download the full data, or here to access our complete infographic.)

Head to head comparisons

Futuur vs. PredictIt
Out of the 106 races covered by Futuur, 77 were also covered by PredictIt (26 for local governments, 19 for the Senate, and 32 for the House). Despite their different incentive systems, both platforms managed to reach the exact same accuracy rate in all 3 categories.

Futuur vs. FiveThirtyEight
All 106 races covered by Futuur were also covered by FiveThirtyEight (36 for local governments, 27 for the Senate, and 43 for the House). Looking at each of those groups, both platforms reached the exact same accuracy rate for gubernatorial and Senate races, with FiveThirtyEight taking the lead when it came to the House by a margin of 2 seats.

PredictIt vs. FiveThirtyEight
FiveThirtyEight had a very slight edge over both PredictIt and Futuur in our analysis. For additional perspective comparing during the midterms, please read Harry Crane’s paper from November 9th. (He comes to a somewhat different conclusion, in favour of prediction markets.)

Conclusion

A bright future for forecasting

As a member of Futuur’s team, I’m obviously happy to report that it performed on par with its counterparts, despite being a new platform and using play money rather than real money. But I’m even more excited for what the future has in store for Futuur and for the world of forecasting more generally.

The relative merits of forecasting methods are a topic of ongoing controversy — from Nate Silver calling PredictIt’s prices “dumb”, to Harry Crane’s rebuttal in ddefenceof prediction markets — and the debate is not likely to be resolved soon. But, while it’s important to compare different approaches to forecasting, its also important to recognize that these platforms are complementary: each provides unique value, and together they provide a better overall picture of what the future will (probably) look like than any one of them alone.

Prediction markets like PredictIt and Futuur reflect an additional layer of abstraction on top of publicly available data such as the polls analysis by FiveThirtyEight. And indeed, FiveThirtyEight was very likely one of the key inputs for many bettors making their buying or selling decisions in the markets. For the time being, whether the additional information the bettors bring to the table outweighs the quirks of a market-based approach remains an open question.

Beyond the platforms evaluated here, there is a lot of activity in the forecasting space. Blockchain projects like Augur, Gnosis and Stox started to bet on decentralizing prediction markets as the core of their strategy, and a recent report from Circle Research covers a great deal of the reasoning why that could lower costs of participating, bypass strict regulation and increase adoption. Futuur’s performance on the midterm elections, however, can be taken a statement about how some of the same challenges could also be overcome by proper use of play-money while still preserving the structure of a centralized market with reputable accuracy rate.

Futuur, with play money and a user interface favoring intuition over steep learning curves, has opened the door to players everywhere, regardless of how much money they have. By expanding its user base through lower barriers of entry, and creating greater diversity in thoughts and opinions, Futuur just might have been able to diminish some of the bias inherent in limited groups of individuals.

While it is clear that there is still room for improvement (Futuur’s Brier score for the midterms of 0.155, while solid, can be improved) Futuur has established itself as a useful complement to forecasts from the other platforms. If it continues to improve and expand its user base, it just may play a major role in broadening the reach of prediction markets and bringing them to their full potential.

I’m betting on it.