Quality Data Saves Lives: A Roadmap for Public Health Data Collection in the Time of COVID-19

Introduction

In late 2019, an outbreak of the novel coronavirus originated in Wuhan City, China. By March 11, 2020, the World Health Organization (WHO) had declared COVID-19 a global pandemic. Although most countries have since endured multiple waves of COVID-19, the U.S. continues to lead the world in both cases and deaths. However, recently in the U.S., there have been glimmers of hope as more than 50 percent of the population has received at least one dose of the Pfizer–BioNTech, Moderna or Johnson & Johnson vaccines, cases and deaths are declining rapidly and the economy is rebounding.

Yet almost a year and a half into the pandemic, challenges remain for protecting the U.S. and world populations. With numerous continuing COVID-19 deaths in Asia, South America and Africa as of June 2021 and ongoing problems with vaccine hesitancy, variants, and potentially waning immunity in the U.S., several improvements to U.S. public health systems must be made to meet the continued demands of managing 21st century pandemics successfully.
In particular, numerous lives were lost and still more people became chronically ill with “long-haul COVID” due in part to gaps in U.S. public health data systems. Shortcomings with these data systems directly underlay the need to shut down key sectors of the economy, causing numerous lost jobs and bankruptcies. This sobering reality underscores the significance of data quality in managing public health emergencies, to save lives and improve quality of life for millions of people.

A Roadmap for Improving the Quality of Public Health Data

In our first three points below, we recommend measures to improve existing data systems. Next, we propose two new, interrelated technology solutions for ensuring we remain in the lead over the COVID-19 crisis. The final two points propose more general steps to enhance COVID-19 leadership and better prepare the U.S. to rapidly handle the next pandemic, bioterror strike, or uncontrolled drug- resistant pathogen. Implementing the solutions proposed here will save lives, and will also help reduce the national security risks we now face.

1The federal government must increase the supply of COVID-19 testing and make genetic sequencing widely available. During the pandemic, there has been an absence of community prevalence data, which stemmed in part from testing failures that were particularly pronounced in about one-third of states. Disease modelers need reliable data from surveillance systems to construct useful models. With syndromic surveillance only being used in a limited way by federal and state agencies, modelers have had to rely on confirmed case data that varies in its extent of capturing disease spread by state. The result has been inconsistent estimates of disease impact that have hampered public planning. As a start to solving these problems, the American Rescue Plan Act of 2021 provided $46 billion to expand federal, state, and local COVID-19 testing, improve contract tracing, increase laboratory capacity, and set up mobile testing units. However, supply-chain and other critical issues affecting testing remain to be solved. And as coronavirus variants become increasingly important in determining the shape of the next waves of the pandemic, genetic sequencing of the virus in individuals who test positive must also be performed on a far wider scale than the current 1.6 percent of positive tests. Coronavirus variants capable of immune escape are rapidly evolving, as partly evidenced by the Brazilian, South African, and Indian variants, threatening the progress brought about by vaccinations and post-infection immunity. These variants must be closely monitored in this country so that carriers can be isolated and their contacts traced.

2The U.S. must resume its position as a leader in the collection of public health data. Public health measures to contain an outbreak include surveillance, rapid case identification, contact tracing, and communication to the public—all of which depend on reliable data. The pandemic revealed serious vulnerabilities in the data systems needed to carry out these functions. The shortage of community prevalence data and reliable contact tracing, as well as other information gaps, has created the need to take action based on the notion that “everyone is a threat,” forcing containment measures such as stay-at-home orders and the accompanying economic shutdowns across large areas.1 With more accurate data, the responsibility for closures can be shifted to local governments, which can take targeted actions to mitigate disease spread, thus sparing most of the
population and preserving the national economy.

3 The federal government must lead the standardization of state and territorial systems for reporting COVID-19 incidence and mortality. The pandemic highlighted the need for standardized methodologies for calculating infection and mortality rates, as states varied in how they tabulated cases and deaths. For example, whereas most coroners and medical
examiners count any COVID-19 death in a state as a COVID-19 death associated with that state, Florida only counts state residents who die in Florida, leaving the pandemic deaths of vacationers, part- time residents, undocumented immigrants, and migrant workers in Florida unaccounted for. With such gaps in U.S. public health data, and delays before official U.S. statistics were made public, universities such as Johns Hopkins and an array of companies such as Safegraph mining cell phone data and social media posts, rushed to fill the gap.2 The resulting patchwork of data contributed to widely varying predictions for disease impact and left hospitals and other providers unsure of how to allocate resources.

Two New Cloud-Based Technologies for Better Public Health Data

In the following points, we propose two new technologies for population health monitoring in the time of COVID-19: the cloud-based Nationwide Reportable Conditions Data System and the COVID Repository. These surveillance, research, and population health management tools will put us ahead of the game in gathering and making use of COVID-19 data, greatly improving public planning and decision-making and saving lives.

1The U.S. must implement centralized reporting of all infectious diseases and vaccinations to a flexible, cloud-based system within 24 hours of test results. When a new case of COVID-19 or a case of any of the other 120 “reportable” conditions is diagnosed, the testing entity (such as Quest or CVS) or provider (such as a doctor’s office or hospital) is required to notify local and state health agencies and the CDC within 24 hours. Currently, testing entities and providers perform this reporting separately to the nation’s 2,300 public health agencies, creating a significant reporting burden and making it hard to compile and share the data. Reporting to a single, centralized, cloud-based system would relieve the reporting burden on testing entities and providers and would facilitate aggregating and sharing the data. The cloud-based data system could then notify the appropriate health agencies within the stipulated timeframe.

We have named this cloud-based platform the Nationwide Reportable Conditions Data System (NRCDS). Initially, its data would be comprised of the 31 data elements mandated by the CARES Act Section 18115, reportable condition stipulations for COVID-19. However, the CDC has expressed interest in expanding such a platform in the future to include the other 120 reportable conditions as well as further data streams including immunizations, admission-discharge-transfer (ADT) events, electronic case reporting (eCR), and electronic health record (EHR) data. We think this is an excellent idea that would facilitate the construction of a rich longitudinal record of infectious disease diagnoses, immunizations, EHR data about chronic health conditions, and health care utilization in the U.S.—COVID-related and otherwise. It would also lead to a database from which it could be immediately determined who was vulnerable to particular infections due to lack of vaccination, for public health case management and education efforts. Finally, the NRCDS would allow the monitoring of infectious diseases, including COVID-19, at all geographical scales—a capability that has been lacking during the pandemic.

Prior to the introduction of COVID-19 vaccines, vaccinations against other infectious diseases prevented two to three million deaths globally each year. The NRCDS would help keep public health departments on top of vaccinations in the U.S., among many other benefits.

2An important outgrowth of the centralized NRCDS reporting would be to establish a national, longitudinal database of all residents’ COVID-19 medical histories, a “COVID Repository.” The database would serve as a master registry of whether individuals were functionally immune to COVID-19 through an established prior infection, vaccination, or positive antibody test. Where possible, this record should contain the strain of each infection, obtained with genetic sequencing, and the vaccination received, along with its effectiveness vis-à-vis the COVID variants. The longitudinal nature of the record would rectify problems such as proof of immunity needing to be time-dependent, a distinct possibility because immunity to most coronaviruses diminishes in several months. (The immunity window for SARS-CoV-2 is still being established.) It would also provide a straightforward way to monitor
reinfections—which are a growing concern at least with the Brazilian variant—and yet most states have not been tracking them because they lack a longitudinal record of infection history. Data analytics could be employed to provide real-time dashboarding on cases, deaths, the proportion of the population with functional immunity,
reinfection rates, and many other measures of interest, at any geographical and temporal scale.

Figure 1 presents the cloud-based Nationwide Reportable Conditions Data System and the associated master COVID Repository, in identifiable and de-identified form, that would be developed from it.

A government agency or contractor permitted to handle personally identifiable health information would maintain the database. Information would be shared with the Federal Aviation Administration (FAA), the Federal Emergency Management Agency (FEMA), the Department of Defense, intelligence agencies, and several other government stakeholders.

Most proposed or actual COVID registries just include whether patients had a confirmed diagnosis or vaccine. This is not enough information. The database needs to be a longitudinal record of when patients were infected, with which strain, when they were vaccinated, and which vaccine they received. This way, for each individual, the window during which immunity is likely to last and to which strains they are immune or still vulnerable can be monitored in real time.

Ideally, the COVID-19 registry data would be combined with the other longitudinal health data in the NRCDS so the long-term effects of COVID-19 and its effect on other chronic conditions could be established. The data set would assist U.S. agencies with managing the long-haul COVID-19 caseload, allowing insights that could reduce the burden for U.S. health care and social systems. The data would additionally be maintained and made available in de-identified form to research agencies, nongovernmental organizations, and actuarial and other health care companies, making it an invaluable resource for research.

Final Recommendations

The U.S.’s COVID-19 experience has provided lessons beyond the data and technology realm in terms of federal leadership in ensuring we are ready for continuing COVID-19 challenges, as well as for another pandemic or bioterrorism. Our last points discuss two broader recommendations.

1The federal government should lead the development of public–private partnerships to solve “impossible” challenges. Public-private partnerships rapidly solved several urgent yet intractable problems during the pandemic, such as the shortage of personal protective equipment (PPE) for health care workers. Perhaps the most remarkable example of a public-private partnership success with COVID-19 thus far was the development of several highly effective COVID vaccines—some relying on the new and better messenger RNA (mRNA) approach—in less than a year, when vaccines were expected to take 10 years to be produced.

A pressing biological threat that will almost certainly be realized is widespread illness caused by an antibiotic-resistant pathogen. Similar to the situation with COVID-19 vaccine development in early 2020, it could take 10 to 15 years and over $1 billion to develop an antibiotic not based on a currently available drug.3 In the future, a public-private partnership could be tasked with developing a new antibiotic to treat a drug- resistant infection—or solving other formidable problems that may face us. As with COVID-19 vaccine development, these efforts should be international to maximize the odds of success, involving scientists from multiple countries facing a similar threat.

2The U.S. should be proactive, not reactive. The U.S. needs to invest in long-term solutions rather than reacting to a predictable crisis. In the past year, the U.S. spent at over $7 trillion on the pandemic. Strategic investment could be more cost-effective.4 In recent years, the world has had several close calls with pandemics, from HIV across much of the world to the Ebola outbreak in West Africa to the rapid spread of the Zika virus in the Americas, the Caribbean, and elsewhere—with long-term health consequences for their victims and societal costs. And as mentioned above, one of the many drug-resistant pathogens that exist now will likely pose a crisis soon. U.S. resource allocation for pandemic preparedness and biodefense has been as little as $6 billion annually in some recent years. Part of the investment could be used to create a Defense Advanced Research Projects Agency (DARPA)- like, advanced government institute5 to study and recommend solutions to these looming threats so that expert guidance is ready to go when the time comes.

Conclusions

COVID-19 poses grave national security risks. The U.S., and nations generally, are not secure in the face of widespread diseases or pandemics. Medical and economic effects can be far-reaching and long-lasting. Life expectancy, particularly for minorities, fell in the last year and can drop further. Jobs were lost and may not be regained, and the standard of living may erode nationally, as it already has for many U.S. workers.

The nation’s youth, who over the past year lost as much as nine months in reading skill and a full year in math, may experience prolonged deficits in education and later employability. Furthermore, our global competitiveness has suffered: In 2020, the U.S. economy shrank, whereas that of China, which controlled the pandemic well after initial struggles, expanded. For the U.S.’s security and indeed the world’s, the health and viability of nations must be on a par, without vast disparities. While there is still time, the U.S. can and must assume leadership in this endeavor, partly to ensure that democratic institutions are also not victims of the pandemic and its aftermath.

The time to improve the nation’s data systems and implement the technology solutions proposed here is now. By taking these measures, the U.S. can minimize the number of further lives lost. It can also create an invaluable research resource that could propel the U.S. to the forefront of understanding the long-term public health consequences of COVID-19. Finally, it is time to make investments in pandemic and bioterror preparedness that are commensurate with the threat. As leaders, practitioners, and community members, we can take the necessary steps to ensure that quality data saves lives.

Takeaways for Federal Contractors

In the wake of COVID-19, the federal government is interested in reshaping its public health data collection. However, federal contractors should be careful in proposals not to ask the CDC or any other agency to violate their standards for data collection and sharing. For example, the CDC abides by a principle of collecting the most minimal data to serve a given public health function. This includes not collecting personally identifiable information if it is possible to avoid it and still accomplish the mission.

Additionally, the federal government wants to avoid anything like a federal mandate to show COVID-19 vaccination proof for participation in public activities. Thus, although the COVID Repository presented here would be an excellent source of data for a COVID-19 vaccine passport, it should not be used for that purpose. Contractors proposing similar systems should be aware of this federal government sensitivity.

Footnotes
3 https://wellcome.org/news/why-is-it-so-hard-develop-new-antibiotics
4 D. A. Disparte, Brookings Institute, 2021.
5 T.Ridge & D.A. Disparte, Harvard Business Review, 2017.

This article appears in the Spring 2021 Service Contractor.