Lessons from Pandemic Models

Last year was so strange that it would be strange to call it just strange. As someone who works with data, one strain of strangeness stood out: that of data science concepts becoming unusually common in conversations. 

We saw people finally entertaining the idea that exponential might actually mean something other than a vague sense of bigness. We saw data visualization take centre stage with such force that flatten the curve became a common phrase. Then there were the likes of R-noughts, doubling times, testing rates, mortality rates, and positivity percentages which probably helped raise statistical literacy by a tiny bit.

There are lessons for the practice of modeling too. This article about The Hard Lessons of Modeling the Coronavirus Pandemic is a must-read for anyone creating or consuming forecasts. It talks about how forecasts were communicated and understood, and how that had real world implications. Some thoughts from and around the article follow.

Purpose

Popular content today is biased towards a view of modeling that leans heavily towards predicting. This is not its only use. Modeling has been around far longer than the current hype cycle and scientists have been using it to understand processes and how various assumptions affect them.

In the case of the pandemic, this means gaining insights into the disease and its transmission, and assessing the effectiveness of interventions.  They can lead to useful insights like learning how important testing was. Or they can be used to explore what-if scenarios for interventions. What happens if lockdown is lifted only for schools? How effective should contact-tracing be?

The purpose is not to predict the number of cases on a given date. This distinction is important but easily lost in discourse.

Projections

Model outputs can lead to action if projected cases are high and to complacence if they are low. This change in behavior makes the models look wrong as the future they foresaw never came to pass. 

It’s a case of what the sociologist Robert Merton called reflexivity.

The usual accuracy metrics aren’t relevant here, because the outputs were not predictions, they were projections. Projections are contingent on what we believe are the current set of conditions. The conditions can change quickly and as such the projections are full of uncertainty.

People … said the model’s wrong because the scenarios you explored aren’t what happened,” said  James McCaw, a mathematical biologist and epidemiologist at the University of Melbourne in Australia. “That’s because the scenarios terrified us, and so we responded to avoid it. Which is not that the model’s wrong, it’s that we responded."

Conflating projections with predictions is dangerous when it’s by decision-makers. At one point projections were used by the White House to say that the US had passed the peak of its outbreak. This claim would have only held true if the US had stayed under lockdown but  considering the projections as predictions led people to use them to ease the lockdown.

Probabilities

Not paying attention to the uncertainties is common. The probability distribution is reduced to a single number, most often the mean, and is used in subsequent calculations. This is as dangerous as crossing a river that’s 3ft deep on average.

So we end up with governments claiming to know the timing of peaks and waves, and being wrong more often than not. Decisions based on averages will be wrong on average. Not to mention the reflexivity we talked about.

Communication of uncertainty is a difficult problem. There is some interesting work around visualizing uncertainty. (My current favorites are quantile dotplots.) But even with such aids there are cultural and incentive issues at play. Decision-makers want answers. Nuance is not in their interest. The doctor who checks a book during a consultation inspires less confidence in people. Scientists, though, are all about uncertainty never believing in one answer too much. This tension is captured by Harry Truman’s quip about wanting a one-handed economist who wouldn’t keep saying ‘on the other hand’.

The frustrating challenge is that researchers are often already offering these explanations, but the public and its representatives tend to want more certainty than science can provide. And when governments decide to disregard researchers’ best counsel and clutch instead at specious but popular policies, it isn’t clear what scientists can do about it.

Probabilities are useful, though. You just need to ask better questions. For instance, instead of asking for the number of hospital beds needed, investigate the probability of beds needed in the next time period. With probabilities like that, decision-makers can make the trade-off of costs and benefits to decide. We ignore the possibility of such data-driven decisions at our peril.

“But from my vantage point,” he continued, “much of the death and illness that we’ve seen in fact could have been prevented.” The models can warn us about fatalities to come, but we have to be willing to learn how to listen to them.

If all of that seems irrelevant for business use-cases, it isn’t. A while back I was handed the problem of making hiring forecasts, and ended up facing almost all these issues. I was using discrete event simulations, the output of which were projections liable to be misunderstood, there was reflexivity, accuracy was needlessly investigated, and there were probabilities to be communicated.

By the way, isn’t it interesting that for all the hoopla around machine learning the pandemic models are mostly statistical or simulation-based? There is something to be learnt about the choice of tools from that, but we will save that for another day.

Rithwik K
Rithwik K
Data Scientist

Data Scientist

Related