Synthetic Data Engine to Support NIH’s COVID-19 Research-Driving Effort

keyframelab/Shutterstock.com

It’s all part of a new partnership the agency is embarking on with Syntegra, and the Bill and Melinda Gates Foundation.

An artificial intelligence-enabled synthetic data generator that converts clinical data of any kind into equivalent, mock versions that don't expose sensitive patient-identifying details is being put to use as a component of the National Institutes of Health-steered National COVID Cohort Collaborative, or N3C effort.

“The NIH’s N3C initiative is a result of the urgent need for understanding of COVID both to develop better patient care and understand the impacts on individuals and the health system as a whole,” Dr. Michael D. Lesh told Nextgov this week. Lesh—the co-founder and CEO of Syntegra, the company behind the synthetic data engine—shed light on how the tool works, and a new partnership between the business, NIH and the Bill and Melinda Gates Foundation that underpins this fresh endeavor.

In June 2020, not long after the novel coronavirus pandemic disrupted nearly every aspect of American life, NIH launched N3C to accelerate COVID-19 research and new medical breakthroughs. The collaborative pursuit, according to a June press release, intends to systematically capture relevant data from participating health care providers across the country, aggregate that data into accessible formats, and in-turn help approved users harness research insights from that harmonized information, via the NCATS N3C Data Enclave. 

Lesh noted that “a broad and clinically deep database for this type of research did not exist,” back then, facilitating the need for a massive data collection that pulls from many contributors. However, the life-saving insights such data access might offer is also limited if heaps of researchers can’t dig into it, which Lesh deemed “a proposition made difficult based on the need to maintain the privacy of the patients within this broad dataset.” 

His company hones in on one possible solution to that challenge.

“Synthetic data solves this issue, thus becoming a key pillar of the overall N3C initiative,” Lesh said. “By creating a synthetic version of the dataset with validated privacy and accuracy to the underlying data, Syntegra allows this groundbreaking dataset to get into the hands of more potential innovators, thus increasing the potential for society to benefit from its use.”

Through the initiative, N3C is responsible for aggregating data collected by more than 70 contributing sites, which at this point amounts to almost 3 billion rows. That number will continue to grow as new health systems join and new patient data flows in from those already on board. 

“Syntegra is in the process of creating a synthetic version of the entire N3C COVID database, including potentially all values for the entire patient population, currently at more than 2.6 million patients,” Lesh said. “It is our understanding that the N3C Enclave contains all relevant data about COVID including the care trajectories of all treatments, vaccinations, etc.”

The company’s ultimate role here is to produce synthetic versions of any data in the Enclave, and “provide rapid, widespread access without violating privacy,” Lesh added. Syntegra has created synthetic versions of test sets, as it prepares to roll out large-scale COVID synthetic data. Down the line, it could enable more rapid access to data-driven insights and help physicians and researchers uncover new insights around racial and ethnic disparities in spread and risk, predictors around hospitalization, long-term adverse effects and the impact of COVID-19 on hospitals, among other topics.

With help from AI, Syntegra’s synthetic data engine essentially extracts the relationship between all variables, within any medical dataset. That then produces about a billion parameters, which “accurately reflect the underlying medical patterns in the data, and are subsequently used to generate brand new synthetic medical records,” Lesh noted. This subsequent synthetic dataset maintains all of the statistical properties and patterns of the original data—without any of the original patient identities leaking into the newly created dataset.

“In other words, no one could work backward from the synthetic data to discover the original patients,” Lesh explained. “Two key elements in particular of this process are that it is done over an entire dataset learning from the data itself, rather than being limited to specific question-based cohorts, and that the output is accompanied by full validation metrics for both the accuracy and privacy against the original dataset.”

Before this new engagement with NIH, Syntegra signed a previous research contract with the  Bill and Melinda Gates Foundation, centered on a similar goal of driving forward large-scale COVID-19 research. 

“The Gates Foundation, however, found a similar issue to the NIH that a single sufficient dataset did not currently exist, leading to Syntegra and the Gates Foundation choosing to bring our existing partnership together into the NIH’s N3C initiative,” Lesh said. “With its unique focus on global health, the Gates Foundation expects Syntegra’s technology to provide a mechanism for cross-border COVID datasets to become widely available for research.”

Outside of this pursuit, the company is also engaging with the Food and Drug Administration regarding the role of synthetic data in regulatory decisions.

“The FDA is exploring with us several aspects of drug approval, including synthetic control arms, improved trial design and ‘what if’ analysis, ongoing drug safety monitoring, and approval of new indications for small sub-populations and rare diseases,” Lesh said.

RELATED PODCAST

X
This website uses cookies to enhance user experience and to analyze performance and traffic on our website. We also share information about your use of our site with our social media, advertising and analytics partners. Learn More / Do Not Sell My Personal Information
Accept Cookies
X
Cookie Preferences Cookie List

Do Not Sell My Personal Information

When you visit our website, we store cookies on your browser to collect information. The information collected might relate to you, your preferences or your device, and is mostly used to make the site work as you expect it to and to provide a more personalized web experience. However, you can choose not to allow certain types of cookies, which may impact your experience of the site and the services we are able to offer. Click on the different category headings to find out more and change our default settings according to your preference. You cannot opt-out of our First Party Strictly Necessary Cookies as they are deployed in order to ensure the proper functioning of our website (such as prompting the cookie banner and remembering your settings, to log into your account, to redirect you when you log out, etc.). For more information about the First and Third Party Cookies used please follow this link.

Allow All Cookies

Manage Consent Preferences

Strictly Necessary Cookies - Always Active

We do not allow you to opt-out of our certain cookies, as they are necessary to ensure the proper functioning of our website (such as prompting our cookie banner and remembering your privacy choices) and/or to monitor site performance. These cookies are not used in a way that constitutes a “sale” of your data under the CCPA. You can set your browser to block or alert you about these cookies, but some parts of the site will not work as intended if you do so. You can usually find these settings in the Options or Preferences menu of your browser. Visit www.allaboutcookies.org to learn more.

Sale of Personal Data, Targeting & Social Media Cookies

Under the California Consumer Privacy Act, you have the right to opt-out of the sale of your personal information to third parties. These cookies collect information for analytics and to personalize your experience with targeted ads. You may exercise your right to opt out of the sale of personal information by using this toggle switch. If you opt out we will not be able to offer you personalised ads and will not hand over your personal information to any third parties. Additionally, you may contact our legal department for further clarification about your rights as a California consumer by using this Exercise My Rights link

If you have enabled privacy controls on your browser (such as a plugin), we have to take that as a valid request to opt-out. Therefore we would not be able to track your activity through the web. This may affect our ability to personalize ads according to your preferences.

Targeting cookies may be set through our site by our advertising partners. They may be used by those companies to build a profile of your interests and show you relevant adverts on other sites. They do not store directly personal information, but are based on uniquely identifying your browser and internet device. If you do not allow these cookies, you will experience less targeted advertising.

Social media cookies are set by a range of social media services that we have added to the site to enable you to share our content with your friends and networks. They are capable of tracking your browser across other sites and building up a profile of your interests. This may impact the content and messages you see on other websites you visit. If you do not allow these cookies you may not be able to use or see these sharing tools.

If you want to opt out of all of our lead reports and lists, please submit a privacy request at our Do Not Sell page.

Save Settings
Cookie Preferences Cookie List

Cookie List

A cookie is a small piece of data (text file) that a website – when visited by a user – asks your browser to store on your device in order to remember information about you, such as your language preference or login information. Those cookies are set by us and called first-party cookies. We also use third-party cookies – which are cookies from a domain different than the domain of the website you are visiting – for our advertising and marketing efforts. More specifically, we use cookies and other tracking technologies for the following purposes:

Strictly Necessary Cookies

We do not allow you to opt-out of our certain cookies, as they are necessary to ensure the proper functioning of our website (such as prompting our cookie banner and remembering your privacy choices) and/or to monitor site performance. These cookies are not used in a way that constitutes a “sale” of your data under the CCPA. You can set your browser to block or alert you about these cookies, but some parts of the site will not work as intended if you do so. You can usually find these settings in the Options or Preferences menu of your browser. Visit www.allaboutcookies.org to learn more.

Functional Cookies

We do not allow you to opt-out of our certain cookies, as they are necessary to ensure the proper functioning of our website (such as prompting our cookie banner and remembering your privacy choices) and/or to monitor site performance. These cookies are not used in a way that constitutes a “sale” of your data under the CCPA. You can set your browser to block or alert you about these cookies, but some parts of the site will not work as intended if you do so. You can usually find these settings in the Options or Preferences menu of your browser. Visit www.allaboutcookies.org to learn more.

Performance Cookies

We do not allow you to opt-out of our certain cookies, as they are necessary to ensure the proper functioning of our website (such as prompting our cookie banner and remembering your privacy choices) and/or to monitor site performance. These cookies are not used in a way that constitutes a “sale” of your data under the CCPA. You can set your browser to block or alert you about these cookies, but some parts of the site will not work as intended if you do so. You can usually find these settings in the Options or Preferences menu of your browser. Visit www.allaboutcookies.org to learn more.

Sale of Personal Data

We also use cookies to personalize your experience on our websites, including by determining the most relevant content and advertisements to show you, and to monitor site traffic and performance, so that we may improve our websites and your experience. You may opt out of our use of such cookies (and the associated “sale” of your Personal Information) by using this toggle switch. You will still see some advertising, regardless of your selection. Because we do not track you across different devices, browsers and GEMG properties, your selection will take effect only on this browser, this device and this website.

Social Media Cookies

We also use cookies to personalize your experience on our websites, including by determining the most relevant content and advertisements to show you, and to monitor site traffic and performance, so that we may improve our websites and your experience. You may opt out of our use of such cookies (and the associated “sale” of your Personal Information) by using this toggle switch. You will still see some advertising, regardless of your selection. Because we do not track you across different devices, browsers and GEMG properties, your selection will take effect only on this browser, this device and this website.

Targeting Cookies

We also use cookies to personalize your experience on our websites, including by determining the most relevant content and advertisements to show you, and to monitor site traffic and performance, so that we may improve our websites and your experience. You may opt out of our use of such cookies (and the associated “sale” of your Personal Information) by using this toggle switch. You will still see some advertising, regardless of your selection. Because we do not track you across different devices, browsers and GEMG properties, your selection will take effect only on this browser, this device and this website.