NIH Partners with Israeli Startup to Generate Synthetic COVID-19 Data


The ultimate aim is to garner new, evidence-based insights to support the fight against the novel coronavirus.

Through a new partnership unveiled Tuesday, the National Institutes of Health will harness Israeli startup MDClone’s platform to generate computationally-derived synthetic data directly from clinical health data aggregated across the U.S., to underpin and advance research in the fight against COVID-19. 

The work is a notable piece of the NIH’s new National COVID Cohort Collaborative, or N3C, effort, through which the federal agency is offering approved users an opportunity to access “different levels” of COVID-19 patients’ clinical data—including a limited data set, a de-identified data set from that LDS and a synthetic data set—to quickly transform captured medical information into insights that can bolster research supporting the nation’s pandemic response. 

MDClone confirmed it’s providing the enabling technology for the synthetic workstream in N3C.

“Patient privacy is always a concern when managing healthcare data,” MDClone’s Chief Commercial Officer Josh Rubel told Nextgov via email Wednesday. “In the context of COVID-19, preserving privacy and allowing for research is of value to the NIH and its academic partners. MDClone has the unique ability to securely manage original patient data and convert it to synthetic data, which has the shape and structure of the original data but contains NO actual patient data.”

Information contained in COVID-19 patients’ electronic health records—when securely anonymized and combined in colossal volumes representing many individuals—could inform new and necessary insights needed to combat the ongoing health crisis. To produce and refine models and approaches to swiftly make use of those massive amounts of data, it must also be safely accessible across many sites. Through N3C, NIH aims to create a centralized data enclave that can be tapped by carefully selected experts to improve pandemic-driven research. Rubel explained that the strategic partnership and use of synthetic data will create a platform where critical information can be shared by many, “increasing both the depth and breadth of data that is accessible to researchers.”

“The end result of leveraging MDClone is access to a wider, more useful dataset without privacy risk,” he said, adding that by “preserving distributions and correlations inside the data, synthetic data makes it possible to conduct complex statistical analyses.”

For this work, the synthetic data will be used to drive new insights for priorities enveloping clinical best practices that promote better outcomes in mortality rates, predictive models that will forecast the disease's severity, potential responses to certain therapies, and more. 

Founded in 2016, MDClone is headquartered in Beer Sheba, Israel. On top of teaming with leading Israeli health care providers since its founding, the company also partnered with Canada this year, and in 2018, launched its first major project in the U.S. with the Washington University School of Medicine in St. Louis. Rubel noted that the tech startup was introduced to the NIH’s National Center for Advancing Translational Sciences, which is spearheading the N3C effort, through “standard professional networking based on MDClone’s experiences at two existing” NCATS partner sites—its first U.S. partner Washington University and Regenstrief Institute in Indiana. 

“The discussion started in February 2020 as part of market intelligence gathering around NIH efforts at creating usable nationwide research data sets,” Rubel explained. “As part of the focus created by the pandemic, the discussion shifted to COVID-19 research and we contracted in May of 2020.” 

The company’s platform was deployed and installed on the NIH network in early June, he noted, and the rest of NCATS’ multifaceted COVID-19 data environment will go live between June and July. From there, the initiative will be ongoing.

“NIH aggregates the data into the NCATS environment and it will be available in the MDClone platform,” Rubel said. “The process for data access and research includes offering academic researchers access to MDClone synthetic data through the MDClone application, existing NIH analytics tools, and/or a synthetic file portal.”  

Amid the pandemic, MDClone’s platform is being leveraged by others outside of NIH, as well. Israel’s Sheba Medical Center, for instance, has used it to track and trace real-time data to predict case severity, better inform treatment results, and explore trends associated with the reported cases of infection.

“Best case, MDClone will give the NIH and its academic network a research platform to quickly generate synthetic data sets for NIH and external research teams to find insights to understand COVID-19 behavior and explore best practices derived from real-world evidence,” Rubel said. “Additionally, the synthetic data will show that we can safely aggregate and make data broadly available, enabling scientists and healthcare professionals to have access to rich data to power discovery and solutions for a variety of healthcare challenges.”