Transcript of Currents 047: Samuel Scarpino on the Epidemiology of Covid-19

The following is a rough transcript which has not been revised by The Jim Rutt Show or Samuel Scarpino. Please check with us before using any quotations from this transcript. Thank you.

Jim: A little bit of filthy commerce before we get started today. If you haven’t checked out my new mobile game Network Wars, do it. I wrote it as an exercise, an exploration in heuristic induction. What the hell is heuristic induction, you might say? It’s the human superpower that enable us to create rules of thumb as ways to make sense of high-dimensional, complex situations. Don’t let those high concepts scare you. It’s actually an easy-to-learn fun game that takes a minute or two to learn, though, getting good at it is pretty hard. A typical game lasts five minutes. It’s only 99 cents and there’s no in-game ads and no come-on. It’s available on the Apple App Store and Google Play for Android.

Jim: That’s Network Wars, two words, and you can learn more about it at Thanks. Now, onto our show. Today’s guest is Sam Scarpino. Sam is a complex systems scientist and the managing director of Pathogen Surveillance at the Rockefeller Foundation where he is a member of the Pandemic Prevention Institute to build surveillance systems that empower communities in this pandemic and prevent another. He’s also an affiliate assistant professor in the Network Sciences Institute at Northeastern University where he directs the Emergent Epidemics Lab. He’s an external faculty member at both the Santa Fe Institute and the Vermont Complex Systems Center. Welcome, Sam.

Samuel: Thanks so much for having me, Jim, really excited to be here.

Jim: This should be a lot of fun as I’ve been reading what you’ve written, et cetera. It looks like you’ve been deeply involved in building models of COVID-19 since early on, and now you’re deep into the data gathering surveillance side of things. Is that right?

Samuel: That’s right. Beginning during my time as an Omidyar fellow at the Santa Fe Institute, I’ve been constructing mathematical models of infectious disease outbreaks, but in particular, working to translate those models into public health and clinical decision making. So really trying to sit at that interface between improving our basic science understanding of how infectious diseases work and then, translating that into action that saves lives and economies.

Jim: Excellent. Good stuff. Back in May and June of 2020, I did several COVID-related episode, some on science, some on policy, some on what the impact of COVID would be on the world, but I haven’t really revisited it since then; thought this was a good time to do so. So maybe I’ll start off with the perspective of, to your mind, what’s been the biggest surprise in what we know or think we know about COVID-19 today versus what we thought we knew in June of 2020?

Samuel: Well, certainly the understanding around how aerosol transmission works and the steps that need to go into preventing that, that’s a piece that we learned and really, are still not learning, but learning how to implement and how important our lack of investment in ventilation and our lack of consideration around the importance of ventilation has been, especially in the United States. I think more broadly, it’s our ability to understand the evolutionary trajectory of the virus, and I actually wrote about this. The Santa Fe Institute put out a large volume, probably to is the right word, collecting all the incredible interviews and work and stories centering around COVID-19. I wrote a short perspective where I highlighted that I think one of the biggest areas of opportunity, especially for complexity thinking, is in better understanding how these viruses are going to evolve.

Jim: We heard all kinds of talk, “Ah, this thing’s less evolutionarily virulent than flu, or this and that,” but it seems like these quite different strains, to some degree, caught people by surprise, at least from what people were thinking early on.

Samuel: The Omicron variant certainly caught a lot of people by surprise if nothing else, because it’s so diverged from anything else we’ve really seen out there. That means that we have a lot of work to do on this surveillance side. That’s one of the pieces that we’re working on at the Pandemic Prevention Institute. But, of course, we also saw a really problematic response happen, which was travel bans going into place that had almost no effect on the virus, but had huge economic consequences as opposed to other kind of measures rolling more quickly. So I think to your point, we weren’t ready for a new variant and as a result, we overreacted in the wrong way and ended up having negative economic consequences without much in the way of positive consequences with respect to viral transmission.

Jim: Yeah. That has been so annoying that clearly, the powers that be just do not actually understand what that’s going on and are not getting good advice. Any sense on why that is? Basically, everywhere except a few countries it seems like the powers that be are just not well- informed.

Samuel: Well, I think they probably are well-informed. The question is what’s happening on the decision side of things and, obviously, there’s no easy choices here. The countries that we know are doing well are countries that have really severe border restrictions. There are parts of Australia where you still can’t move from state to state within the country. So the consequences of acting in that direction are difficult, but there are things we could be doing, rapid testing, availability of masks, things that would help us get back to something that’s a little bit more normal that we’re just not doing and haven’t been doing. I don’t know that I have a good answer for why.

Jim: Yeah. That’s probably a problem above our pay grade for today. So let’s dig into some of the basics. I solicited some questions from some friends of mine, my wife, who’s very interested in this stuff. So let’s just dive in and ask some specific questions. One, that I think a lot of people scratch their head about is, what does your modeling tell us on why this epidemic, this particular epidemic, seems to oscillate. We get these peaks and valleys and all that, the old standard, simple-minded epidemiological model does not produce those kinds of oscillations. What is it that your modeling is showing that are the drivers of these oscillations?

Samuel: I would say it is still an open question around what’s driving the oscillations, but we’re learning more as time goes on. It’s clear, as I mentioned at the beginning, how important ventilation is and our ability to maintain good ventilation depends a lot on the weather, because we rely quite a bit on having windows open as our primary for of ventilation. So I think that’s part of the explanation for why we see surges in September in the Southern part of the United States and now, we’re entering into surges in the Northern part of the United States as we get into the winter, so that’s certainly part of it.

Samuel: Pieces that we don’t understand as well, there was a variant Delta that emerged about this time last year. Now, we’re dealing with the Omicron variant. There’s a pace in tempo of how the virus is evolving that’s also creating some of these cycles and that’s a piece that we really don’t have a good handle on. So some of it has to do with the weather, some of it has to do with the ebb and flow of measures that are in place as we relax measures, then cases start to creep back up, and then we put the measures back in. Then, some of it has to do with the evolution of the virus, which we just don’t understand very well.

Jim: Yeah. The one about our own interventions and maybe, even more than just the formal interventions like mask rules, et cetera, is I noticed certainly here in the town where I live is that people’s behavior seems to become loose and tight based on a lagging indicator around case rates and things of that sort. So one could imagine that if people’s individual agent-based behavior is loosening and tightening on, say, a three-week lag on perceived counts, et cetera, you might see this kind of oscillation. Have you guys played with that kind of thing?

Samuel: We look very closely at real-time mobility and social contact data as an indicator of how people are responding. That’s been one of the hallmarks of this pandemic. People do what they think is going to keep them safe while maximizing the things that we know are important economically, socially from our mental and physical health. So you get these oscillations in people doing more, doing less. They tend to not exactly correlate with when measures go into place. People tend to lock down a little bit earlier because they get scared and loosen up a little bit quicker. Really, to me, this is work that we’ve done trying to understand the informational dynamics of infectious disease outbreaks. It really means you need real-time data to include in your models so that you’ll constantly be aware of how people are responding, in part, because we don’t have good models of human behavior that we can use in a predictive sense.

Jim: Yep. That makes perfect sense. That’s a perfect transition to my next question, which is, as we often talk about the Santa Fe Institute models are good, but they’re only as good as to the degree they can be validated at data and that they are driven by good data. So maybe you could talk to us a little bit about how is the monitoring structured in, let’s say, the United States to start maybe elsewhere in the world so that we’re able to get something like real-time data, so as to be able to improve our modeling?

Samuel: We know that there’s also one there piece that makes real-time important. The model structure itself changes and the physicists bristle a little bit at that because that’s not exactly the way it works, but that’s not a bad analogy for how it happens. The model structure changes the causative reasons why people are behaving the way they are, the way the virus is evolving, those shift in time, which means we need to be updating our models; not just the parameters of the models, but the actual structure of the model itself. At the Pandemic Prevention Institute and The Rockefeller Foundation, we’re looking at this a couple of different ways.

Samuel: One is bringing in real-time mobility data, focusing on genome sequencing internationally and nationally. However, that still leaves pretty big gaps, and one way we’re really excited to try to fill those is with wastewater-based epidemiology. You might be seeing news about this all over the U.S. Our partners in Houston are using wastewater to identify places where their Omicron variance and then surge resources. This has been a big part of the polio response for decades now where you identify polio virus in wastewater and then use surge resources to contain.

Jim: Yeah, that’s clever. A question I had when I was thinking about this is, how much sequencing are we doing and how real time is that, because presumably for monitoring new variants and also the competition between variants, that would seem to be your most important data?

Samuel: It varies a lot internationally, and it varies a lot nationally. In the U.S. right now, we’re sequencing between say three and 10% of all the cases that happen every day. Now, in some states it’s a lot. In some cases it’s more like 10 to 20%, maybe in Massachusetts and then, in other states that’s under 1% or maybe more like two or 3%, and this is way up. So back in 2020, we were sequencing well under 1% of all the cases. Internationally, again, it depends on the country, UK, U.S. are doing a huge amount of sequencing. South Africa, obviously, we’ve seen India; however, then there are other countries that are doing much, much less sequencing. In fact, that’s a modeling work we’ve been doing at the Pandemic Prevention Institute is trying to estimate what the detection thresholds are for different kinds of variants, given the sequencing volume.

Samuel: Now, the other piece is the delay time, so it’s around 30 days on average or the median, I guess, between sample collection and when a genome is made available on a public database like GISAID so that the world can see it. This has created a pernicious data issue during Omicron, which is part of that 30 days, is it’s just there’s a lot of sequencing to get done and only so much capacity, so there’s a big queue. Well, we have a PCR test to generate probable cases, and then we jump the queue and are getting some of these sequences out in four or five days and that actually makes it look like it’s spreading faster. Now, it’s spreading plenty fast on its own, but you do get this sense that because some of these Omicron sequences are being prioritized, we are getting those more in four or five days instead of 30.

Jim: What would it cost to reduce that down to one day? Clearly, it would be a lot better if it was one or two or three days. Are we talking a few hundred million here, or a few billion? Presumably, if the throughput is being canned it’s a capital cost to essentially reduce the queue more than it is anything else.

Samuel: As I understand it, one of the biggest bottlenecks right now is the availability of sequencing reagents. This is one of those unintended consequences where there are places that have to make decisions, “Do we sequence another SARS CoV-2 genome, or do we sequence the genome of cancer cells in a patient to try to see if there’s a therapy that might work?” So it’s the same reagents, it’s the same sequencing machines that get used for all of this, and that creates supply chain issues. There is a hard boundary in terms of prepping the sample and getting it loaded on in the machine, how long the machine takes to run. Same day is probably not really realistic. I think a few days is probably more likely, but it would be a big expense. That’s one reason why we’re really excited about wastewater is you can do more short- read sequencing. You can do more PCR testing and it can happen a lot faster and a lot cheaper.

Jim: Yep. That sounds good. That sounds like a real improvement in the ability to do near real-time surveillance. Let’s jump in a little bit into Omicron, obviously, one of the issues of the hour. From where you sit, what are you seeing about information with respect to its contagious-ness and its seriousness? I will say, I pat myself on the back a little bit, the moment I heard about Omicron, I hypothesized, “I’ll bet this sucker will be quite a bit less serious in forms of illness. Because it’s such a large jump in genomic space, it’s really hard to win in all dimensions. If you jump this far and you pick up transmissibility, it seems to me, genetically improbable that you wouldn’t also lose out some in seriousness in this disease.” So from where you sit, what have you learned, or what are you hearing, or what data are you seeing specifically about Omicron?

Samuel: Well, we know that Omicron is spreading really fast in South Africa, the UK, and Denmark. These are the places where we have the best, most complete data. In all three of those countries, it’s doubling about every two days, which is almost twice as fast as Delta was spreading. Now, what we don’t know is how much of that is because of intrinsic properties of that make it more likely to infect people versus immune evasion. So there’s evidence of higher rates, two to three times higher rates of reinfection following natural infection in South Africa. We’ve seen the neutralization studies. As you know, antibody neutralization is just one small part of the immune response, so it’s really hard to translate that neutralization change into some estimate of the vaccine effectiveness in terms of blocking transmission, although, it certainly looks like a pretty big component has to do with immun evasion and again, a big genetic change like that would be supportive.

Samuel: In terms of the severity side, it does look like the cases are less severe. There’s some caveats, though, which is it’s still early days and we know it takes time for cases to progress in severity, so we are wait-and-see on that. But because there’s a higher rate of reinfection, it might also be that some of that reduced severity is because the immune system is mounting a response and keeping the illness in check and clearing it faster. This is, actually, one of the things I’m really concerned about in the United States is, let’s say, you do have a higher rate of breakthrough infection, more transmission in vaccinated individuals, less severity in vaccinated individuals, we’ve still got a sizable person of the country that’s UN vaccinated and that could lead to a pretty big spike in severe disease. Even if the disease itself is less severe, it’s probably a linear reduction in severity and a non-linear increase in the transmission potential, which still could lead to overwhelmed hospitals.

Jim: Ah, that makes a lot of sense. As I like to tell people, “The way to understand the modern world is to think in exponentials. Think in exponentials and understand the phenomena of fat tails.” I like that, actually. A linear decrease in severity, but an exponential increase in transmissibility and you run the numbers, and I’m presuming you guys are running the numbers every day, or at least the modelers are running numbers every day, and that could produce some big hospital surges.

Samuel: Well, and again, one of the challenges, and this is something that many people have learned during the course of COVID-19, our hospitals run at 95% capacity during normal operations that they may have been optimized for that, because even the non-profit hospitals still have revenue targets that they have to hit that involve the beds being full. The challenge, then, is that doesn’t leave very much flex space for surges of things like SARS CoV-2. Of course, we know there’s also influenza this season in a way that there hasn’t been and the same beds, the same ventilators, the same nurses are there.

Samuel: That’s the challenge, is that you have this exponential process, not very much wiggle room in terms of our healthcare capacity and that quickly leads to overwhelmed hospitals. Then you start to get this feedback loop, like what we saw in India, where you have death rates increasing, severity increasing because we don’t have the resources we need to treat. This is really the hallmark of it’s why so much of the important work that’s been done in infectious disease dynamics and linking to economic modeling and impact on the healthcare has come out of places like the Santa Fe Institute, because this is really, truly a complex adaptive system.

Jim: Indeed. Now with respect to the intersection of data and modeling, when we get to the issue of breakthrough infections from Omicron and Delta before that, is our data being captured at sufficient granularity that we’re able to map up at statistically significant levels; cases with previous vaccination states, i.e., “Okay, this person had J&J eight months ago, and we have a cohort of people from eight months ago and we know this about how many breakthroughs we’re getting,” Do we have data at that level of granularity to help drive our models, or is that still a hole in our data collection?

Samuel: It’s a big hole in the data collection in the United States. This is an area where the Pandemic Prevention Institute and the Rockefeller Foundation has been investing heavily is linking metadata, especially on vaccination status to the genomes that are sequenced for SARS CoV-2. There’s some is there because the medical records are often what contain the data on vaccination status, and it’s typically public health that’s doing the genome sequencing. In fact, in the U.S., if you got your SARS CoV-2 genome sequence, you probably could not get access to it with it labeled with your person identifiable information, because that might be considered a diagnostic and it’s not been approved by the FDA.

Samuel: So you can sequence the genomes for public health surveillance, but not for medical decision making, and that makes it pretty challenging to get those links together. So we’re working on it. We have a lot more of work to do. The best places for that are going to be the UK and Israel, and really that’s where everybody’s looking. Also, South Africa has really good data on prior infections, because they’ve been running zero surveys to look for antibodies and they understand vaccination status. That’s partly why when I say there’s this two to three X increase in reinfection, it’s because South Africa has a pretty sophisticated surveillance system around prior infection. Again, that’s something that we’re woefully lacking in the United States.

Jim: Now, in medical technology, I’ve often run into HIPAA as a problem that makes everything exponentially fucked up. Is HIPAA a part of the problem here with integrating our data across sources?

Samuel: Sort of, and again, like everything, the answer is always, “It’s complicated.” So how does this come into play? Well, I can get a waiver for a limited data set that contains some pieces of HIPAA- protected data, but not everything, say, like the zip code of an individual or the time that date they were vaccinated and date they got infected for public health use and link that to the genome. The challenge is how do we decide what is public health you use versus research? Because I can’t do that for research, and so that’s where the barrier for HIPAA comes in is not having really clear guidelines on what is public health versus what is research? Clearly, if the CDC is doing it, it’s probably public health. But one of the things that’s happened is lots of researchers, including researchers at the Santa Fe Institute, are really doing public health work as a part of the response to SARS CoV-2, and we don’t have really good legal guidelines around what that waiver applies to. What is the legal test for research versus public health?

Jim: Yeah, this sounds like an area where, because one of the things we know for sure is this is not the last pandemic that’s going to hit humanity. So this sounds like an opportunity for learning and institutional change to get some of those things clarified so that when the next one comes, we don’t have this model.

Samuel: Exactly. Again, that’s one of the things that we’re working on at the Pandemic Prevention Institute and the Rockefeller Foundation is we want to be relevant for this current pandemic. We’ve been heavily engaged in the response with our partners all over the world, but we want to make sure what we’re building is the kind of data system that will be ready for the next pandemic. Ideally, there won’t be a next pandemic, because we will identify the cases when they’re spreading and still rare, we can deploy resources and contain before it grows into a regional epidemic and then a global pandemic.

Jim: I continue to remind people that in some sense, as horrible as it’s been and as much human tragedy has come from COVID-19 epidemic, it could have been, and probably will be in the future, variants that are much worse, not variants per se, but other pandemics. With a death rate on the order of something around 1%, most of that concentrated and highly-comorbid victims, there’s plenty of room for worse pandemics than this one, and we need to be ready.

Samuel: No, this is not the big one that people have been worried about. This was not a 1918 flu situation. One of the things with SARS CoV-2 is maybe not so much anymore, but it’s still pretty heavily reliant on super spreading events, especially during establishment. This is some of the mathematical modeling work that my colleagues and I did back in 2020 in January and February. The establishment probability is pretty low, because there’s a big effect of random chance though [inaudible 00:24:20] on the spread of SARS CoV-2. With influenza, influenza is much more deterministic. There’s very little that we can do besides lock down and wait for a vaccine, and that’s not the case with SARS CoV-2. So we really are concerned that the big one is still coming and we know we’re not ready for this one, which was the warmup and what’s that going to look like, not if, but when, the big one shows up?

Jim: Yeah. Make it as contagious as flu and as lethal as MERS or something like that, and you’re on the verge of a societal meltdown if we don’t act in a much more disciplined and intelligent fashion than we did this time around.

Samuel: It might be even more dangerous than that. We were right on the edge of societal meltdown and, arguably, there was some localized meltdown from this one. As you said, it’s that problem of fat tail and exponential growth, which is even if the case of fatality rate’s not that high in terms of what it sounds like in percentage terms, if you infect hundreds of millions of people, billions of people, perhaps, and so you multiply a big number by a medium-sized number, you end up with a lot of deaths. So, if we have something that spreads like influenza but is twice as lethal as SARS CoV-2, which would still be almost a factor of 10 less than MERS, more we’re in for a world of hurt and, and really risking societal meltdown. So it may not even be that we need to get all the way to the super bug, the Andromeda Strain. If we just get a little bit worse, it might be enough.

Jim: Yeah. That’s to your point earlier about the fact that, well, unfortunately, our science and our policy making tends to be siloed. In the real world, all these networks are coupled, right? The illness is coupled to the economy. The economy is coupled to politics. Politics is coupled to war, and so the potential cascades across the networks are something that our society, so far at least, seem incredibly inept at even contemplating. Of course, that’s the role of complexity science is to bring that lens to this problem.

Samuel: That’s one of the most important parts about places like the Santa Fe Institute is bringing together people that can think about these problems from lots of different angles. If you remember back in 2020, I think it was May, we convened a rapid response meeting. It was remote, of course, at the Santa Fe Institute that brought together our economist, colleagues, and social scientists, physicists, the mathematical modelers, computer scientists, people that think about policy and all under one Zoom roof to talk about, not just the current response to SARS CoV-2, but what we needed to be doing on the scientific side, what gaps were exposed, what we needed to be doing on the policy side of things, what gaps were exposed.

Samuel: Again, I think what we really, and we’ve seen this play out in terms of, major investments in the Santa Fe Institute and the work that has come out of there, the importance of these interdisciplinary complex system centers and our ability to respond to these societal problems that we’re faced with. It’s only going to get harder, climate change, et cetera. I always say that it’s probably not climate change that’s going to get us; it’s an infectious disease that emerges and spreads because of climate change that’ll get us first.

Jim: Yeah, and maybe throw in a famine on top of it, right?

Samuel: Yeah.

Jim: … so everyone’s health is weakened or much of the world’s health is weakened and a pandemic, and, and, and, right? One of my little areas of hobby interest is studying catastrophic failures, and one of the things that we find again and again and again, is that most modern systems are relatively robust against single failures. But when you get multiple failures coming simultaneously, that’s when you get catastrophies, that’s when you get nuclear power plants melting down, that’s when you get wide area grid collapses, et cetera. This could well be the case with respect to multifactors at the intersection of famine climate change and a pandemic.

Samuel: That was a big factor in 1918, and I remember it was probably five or 10 years ago now I was at a conference. One of the things they asked the speakers was to comment on what keeps you up at night? I said, “Well, what keeps me up at night is that as we understand it, most of the fatalities during 1918 were due to opportunistic bacterial infection because of weakened immune system fighting off influenza. We was pre-antibiotic, so imagine a situation with antimicrobial resistance, where we’re facing a flu pandemic and our antibiotics aren’t working, and we’re dealing with antibiotic-resistant bacterial pneumonia.”

Samuel: Again, that’s the kind of thing that you start to layer two or three or four of these failures on top of each other, and you go from something that is manageable to something that is completely unmanageable. Again, that’s one of the things we’re working on at the Pandemic Prevention Institute is yes, we can use wastewater-based epidemiology to respond to COVID, but we can also use it to monitor antimicrobial-resistant genes in the wastewater and try to inform decision making around development of next generation antibiotics, the development around vaccines for certain bacterial pathogens, try and fight complex systems with complex systems.

Jim: I’m glad to hear someone’s doing that. Very good. Let’s get back to now some of the basics of the modeling and the public discussion around COVID. Back in 2020, say June, there was a lot of talk about herd immunity, and you saw people doing calculations on the back of the napkin that said, “When we get 72% of people who’ve been vaccinated or had the disease, then well have herd immunity.” Then when Delta came along, they raised the number a little bit, but I think it’s turning out to be a hell of a lot more complicated than that. What can you tell us about even the concept of herd immunity in this context and where are we in that space?

Samuel: Herd immunity is a very difficult term. We wrote this paper talking about the difficulty interpreting the basic reproductive number, the R0 or R naught, and how challenging it is to translate that single number into an estimate of risk. The herd immunity threshold, basically, relies on that calculation layered on top of it and is even more difficult to unpack. So you have to make so many assumptions about the way immunity is going to work, the way social contacts work, the way behavior works, right? So the only reason that you would get a potential benefit in herd immunity is if you have a heavy-tailed contact distribution, and the same people always have a lot of contacts. You don’t have shifting numbers of contacts.

Samuel: We’ve seen that where some people can lock down, some people can’t. We talked about the shifting behavior and all of that affects our herd immunity threshold calculations. Now, of course, we’re faced with this Omicron variant that clearly has a higher rate of breakthrough infections. We’re in a situation where there’s not a lot of stability in that herd immunity in the population. The second piece is none of this is evenly distributed, so we’ve got what, 70% vaccination in the state of Massachusetts, and that’s because we have some cities with close to a hundred percent vaccination in other cities like the capital, which, until very recently was under 40% vaccinated.

Samuel: So, as we’ve seen with lots of infectious diseases, measles is a great example, even though we’ve got over 95% vaccinated with measles, you end up with huge outbreaks, because there’s in, what we refer to as homophily in the social network, people tend to have the same vaccination status if they’re close to each other in the network. So maybe to put a finer point on it, the details matter, and they don’t matter in a linear way, they matter in a non-linear way. So if you’re a little bit wrong about something, then you’re a lot wrong about what the consequences are going to be.

Jim: Now with all that said with what we know today, which, of course, is not everything we’d like to know, what do the models say about the future progress of the pandemic? Let’s say, just for the United States for the time being?

Samuel: Well, this Omicron variant moves really fast and that’s a bad thing because that means hospitals are going to get overwhelmed, huge numbers of cases simultaneously. That also means it’ll be over faster. We saw how long it took the pre-Delta variants to ramp up over the course of the fall, back in 2020 from September. So what I thought was going to happen with Delta, if we hadn’t had Omicron, is there would be a wave in the U.S. and by February we’d be through it, and then we’d have a pretty normal 7, 8, 9 months before we get back into respiratory season again, and then we’d probably have another wave that maybe it would be smaller and then, it would settle into something that was a little bit more like a normal coronavirus.

Samuel: With Omicron, what we have to find out is how much immune evasion is there on the vaccine side? Can that be mitigated with boosters? What is the effect in terms of severity, but then importantly, what does this mean for the evolution of the virus? Does another variant come along that’s going to break through the immunity again, and are we going to be in this influenza cycle where you have these seasonal outbreaks of SARS CoV-2 every single year? That might mean that we have to have rolling mask requirements, that we need to do much better with testing and to include PCR testing and rapid antigen testing and then, the wastewater surveillance and all sorts of other things.

Samuel: So part of it, and we have some agency, the more we do now to shore up our data systems and implement the kinds of things that help us detect early and respond the better we’re going to be. But there’s a really big question mark around what’s going to happen with Omicron, how serious and severe it’s going to be and then, whether this is just the first of many variants that are coming or not. Maybe it’s not a foregone conclusion that this will evolve the way influenza does. So it could be that there’s not that many more variants coming after Omicron, but it might also mean that we’re going to be stuck in this cycle for years and years and years.

Jim: So it says that out very far, the error bars are still damn large?

Samuel: Yes, and that’s an unfortunate piece. It’s happened a few times during this pandemic. It happened, if you remember back at the beginning of the summer, if it hadn’t been for the Delta variant, we wouldn’t be talking about this anymore. This would all be behind us.

Jim: Our family’s been very COVID strict, but there was that window, June and July, 2021, it seemed like we could open up a little bit, but nope, Delta came and whacked us in the ass.

Samuel: We’ve gotten a little bit of a reprieve a few times, but it keeps coming back. Again, I think we just don’t know whether we’re going to be in this cycle of variant after variant after variant. But we do know there are things we can do to improve our surveillance system so we catch things faster, and can respond more quickly. We do know that masking, testing, contact tracing, ventilation work against all the variants and investing in those will give us a better chance of avoiding lockdowns and of having a more normal existence with this virus.

Jim: That’s a perfect lead to my next question, which is, if you were the God of the budget, what would you spend money on with respect to gathering data to be able to build better models that we’re not doing today? Let’s assume that your budgetary Godhood was damn close to unlimited, DOD-level budgets.

Samuel: One of the things that we know is that the number of social contacts matters in terms of disease risk. There’ve been a couple of studies showing that if you just have social contact data, you don’t need to know who’s contacting who, just how many social contacts people have. That’s highly predictive of the rate of infection in different parts of the world. So having detailed contact data, not on who’s contacting whom, but just how many contacts people are having, what duration, et cetera, so that kind of mobility and social contact data is so important. We’ve learned that during this past pandemic and is expensive to collect, to analyze, to maintain. It’s probably not quite DOD-level expensive, but it’s a big cost. I would put wastewater-based surveillance into enough airports to cover 95% of all commercial air travel in any given day for a whole host of pathogens. Then, also I would be doing short-read metagenome sequencing to look for the unknown unknowns that might be present and lurking.

Samuel: Then, I would invest heavily in how we can gather health data from people in such a way that we can link these genomes to health status, but all sorts of other pieces of metadata that we might need. We’ve been in some conversation with the folks about how we might be able to leverage a public health coin or cryptocurrency to incentivize people to share their health data and leverage something to keep it secure, but also to monitor how it’s being used. So I think that’s the kind of thing we need to figure out how to unlock are these health data sets that are spread out all over 25 different electronic health record systems, and on pieces of paper and otherwise. So that would be the three big things that I would be investing in, in terms of data is a real-time mobility and social contact information. It’s real-time molecular data on pathogens, especially in what’s moving around in airports, and then it’s figuring out how to unlock these health records and get them linked to things like SARS CoV-2 genomes in a privacy-preserving way.

Jim: Yeah, that last one’s a big one, right? Once you get dealing into those goddamn distributed systems and trying to do metadata equivalence and all this stuff, not easy, but it is doable, and the stakes are so large. What has been the social cost of this pandemic certainly in the many trillions of dollars. So even DOD-level expenditures strike me as the socially responsible thing to do so long as we do the right things. But back to your first one, social contact measuring, for a while there, there was a lot of noise about using cell phone data. How has that ended up working out as a surrogate for social contact, not necessarily call data, but position data?

Samuel: There’s been a huge in my lot of work done on that, and a lot of this we did in the United States out of Northeastern University and the lab of Professor Alessandro Benadia, where we were getting data from company called Cubic that has a social good function and processing that to understand the number of social contacts, people traveling to commuting for work, people moving around regionally for non-work related reasons, and that’s been recapitulated internationally. We had individual level data from Baidu Service in China looking at people moving out of Wuhan in response to the cordon sanitaire and then across the country as different measures went into place. So I think if there’s really one thing I would point out as being critical during this pandemic, it’s been that that individual level data coming off of the cell phones and that’s a piece that we need to make sure we’re investing in.

Jim: Is there anything that could be done to make it better, or is it more an issue being able to process the data expeditiously?

Samuel: It’s the processing piece. It takes a lot of work, both in terms of person, hours, but also just computational time to actually crunch all those numbers. You think about having individual level data every three minutes on somebody’s location and you have to up aggregate it, ensure it’s anonymized, translated into some measure that that is actually meaningful. So it’s just a huge amount of work and has real costs in terms of cloud storage and computation. Again, that’s the kind of thing where it’s a real dollar amount that people care about, and that means you have to have someone willing to pay for it. What happens with pandemics is that as soon as things start to get a little bit better, people just completely stop paying attention and we’ve seen that, right?

Jim: You have an order of magnitude sense what it would cost to take the cell phone data that’s available and process it expeditiously and comprehensively for the purposes that you have in there. Any gross sense of what that costs?

Samuel: If you had to pay full price, now we got to assume that you can get the data for free. The data’s obviously really valuable. So, let’s pretend you get the data for free. You’d have, let’s say, $10,000 per city, per month times the number of cities on the planet, plus the FTE software engineers to maintain it.

Jim: So a few billion a year, chump change.

Samuel: That’s probably about right.

Jim: Yeah, because you take, say, a thousand cities $10,000, gross it up for government incompetence, and we’re talking a billion or two a year. I see, essentially, the croissant budget for the Department of Defense, so definitely doable.

Samuel: It’s doable. that’s the thing, right? I don’t know how the Defense budgets get set and I don’t know how they decide what is a national security problem. This certainly feels like one to me, and making these data available globally, we know how connected we all are to each other is so vital. This is certainly something we’re working on at the Pandemic Prevention Institute. We’re not in a position to take on billions of dollars a year in costs, but we are trying to figure out how we can lower the barriers, how we can bring together groups, technology, companies, private sector to build something like this, because the world needs it. They need a bunch of other things, but this is a really key resource that the world needs.

Jim: Yeah. I’ve been, lately, campaigning for the U.S. government to establish a department of wicked risks, that its job is to understand the complex risks and their interrelationships, and to be able to deploy resources at mini DOD scale where it passes a cost-benefit analysis, because the risk to our national security from this is probably realistically higher than it is in the defense regime. A reasonable probabilistically modeled basis, the probability of our really fucking up our society, it’s more likely to come from a pandemic than it is by Costa Rica figuring out how to overthrow the United States by military means or something crazy like that. Hence, we should be putting, let’s say, we don’t need 700 billion a year, but if we put 70 billion a year, a tenth of our defense budget into studying and working to ameliorate in exactly the ways you’ve laid out here, the complex systems risks of which pandemic is one of the greatest, we’d probably be much safer as a society.

Samuel: I agree. We can do a back-of-the-envelope calculation. There’s a lot of comparison between weather forecasting and infectious disease modeling. In some ways it’s a good analogy, in other ways it’s a terrible analogy. But one way that it’s a good analogy is that NOAA spends about 4 billion a year gathering data to power the forecasting models for daily weather forecasts, thunderstorms, hurricanes across the U.S. CDC, it’s a little bit hard. They don’t have a specific line item for data gathering, but it’s probably somewhere around 400 million a year.

Samuel: That gap is a big part of the reason why we’re in this mess in the first place, is that we’re spending 4 billion a year. We know how hard weather forecasting is, and this is even harder than weather for forecasting provably and we’re off by at least a factor of 10. But as you said, maybe it’s more like a factor of 100 and that’s still a factor of 10 or 100 less than the defense budget. We need to not just invest hundreds of mill of, but billions of dollars a year in gathering the data that we need to power these models and prevent pandemics.

Jim: All right. Well, I think I’m going to wrap it up there, because I think that’s goddamn close to the punchline, people. So those of you who vote, those of you who talk to elected officials, get this story out that we are being grossly irresponsible as a society by not making investments on the side of defense against pandemics, and there’s some other risks too, which we talk about on some of our other shows, that are at least within an order of magnitude of the amount of money we’re spending on defense. So Sam, I’d like to thank you for an extraordinarily deep, interesting, and rich conversation here today.

Samuel: Thanks so much for having me, Jim. It’s really been a pleasure.