Real data, especially from different states, can be messy.
The Centers for Medicare and Medicaid Services is using the Amazon Web Services cloud to run analytics on 74 million Americans’ Medicaid data.
The move is a key part of CMS’ effort to get better information regarding the more than $500 billion in Medicaid benefits the government doles out annually that could help crack down on waste, fraud and abuse and deliver other insights, according to Jessica Kahn, director of the data and systems group at CMS.
CMS is porting in five years or more of records from 50 states—totaling 72 terabytes of data—to the cloud that would have been a “log jam” and an impossibility for the agency’s aging internal data centers. Kahn said California alone produces some 900 billion records per month.
» Get the best federal technology news and ideas delivered right to your inbox. Sign up here.
“We’re looking at it for fraud and abuse, and giving data to a technical experts’ panel to dig in and understand states’ data quality,” said Kahn, speaking Tuesday at the AWS Public Sector Summit. “To be clear, what we’ve built with AWS is an operational data set. We need to use it as it comes in, and it will never be perfect because it is real [data].”
Real data can be messy, Kahn said.
Because Medicaid is federally funded but state administered, gathering data from each state is like collecting it from “51 different Medicaids.” Data coming into the system is also cumulative, which means there is a “constant influx of complex data from 51 different paths” that can affect the overall data set retroactively.
Billing delays, for example, can affect previous information sets going back months or years. In addition, CMS attempts to put the state-accrued data through a data dictionary, which Kahn said “is like match-making times 51.”
Because Medicaid data is highly sensitive, it also goes through a scrubbing process to reduce the risk of personally identifiable information becoming public through hack or misstep.
“Every time we ingest data, it has to go through a complex set of business rules,” Kahn said.
Medicaid data sets are the first CMS plans to analyze in the cloud, but won’t be the last. Kahn said the agency is working to put more data in the AWS cloud, with the expectation to create a “shared data model” across various stakeholders, including other federal agencies. Kahn said CMS has shared some of its Medicaid data with the Congressional Budget Office and U.S. Census Bureau, providing better value across government to taxpayer-spent dollars.
Aside from delivering better value, Kahn said analyzing a better collection of Medicaid data will help the government address serious national health care issues. Medicaid covers more than half of all childbirths in the United States and more than half the long-term care costs for elderly Americans and those with disabilities.
“We can’t have any compromise in our ability to answer questions,” Kahn said. “Who is getting prenatal care? How many are vaccinated? What are the highest costs of drugs? These are important questions.”
NEXT STORY: Amazon Echo Is Your New Stylist