The National Institutes of Health plans to recruit a new associate director to examine the potential for vast new troves of biomedical research data related to genomics, imaging, and electronic health records, the institute said Thursday.
NIH’s associate director for data science would be one of only a few high-ranking government officials tasked solely with managing how an agency handles data.
Government has been trying to capitalize on advances in so-called big data analysis for the past several years, including through a $200 million investment in data research. Several agencies also have implemented their own big data programs, such as a Medicare program that mines millions of claims documents to spot common fraud patterns and several intelligence and law enforcement programs aimed at spotting signals of terrorism and crime.
The Federal Communications Commission appointed a chief data officer in 2010 and named chief data officers at each of its divisions but the position hasn’t caught on widely in government. An October report from the industry group TechAmerica urged the government to appoint chief data officers at most agencies as well as a federal chief data officer to assess the use of big data governmentwide.
Big data refers generally to the mass of new information created by the Internet, by scientific tools such as the Large Hadron Collider and by digital sensors such as water quality meters. Big data analysis is the process of matching those troves of information with complex computer systems to gather intelligence and spot patterns.
“NIH aims to play a catalytic lead role in addressing these complex issues -- not only internally, but also with stakeholders in the research community, other government agencies, and private organizations involved in scientific data generation, management, and analysis,” NIH Director Francis Collins said in a statement.
Collins has asked National Human Genome Research Institute Director Eric Green to fill the associate director position on a temporary basis, the institute said.
As part of the government’s big data initiative, NIH plans to put a data set of the human genome project in Amazon's EC2 computer cloud with tools to make the information easily accessible to researchers. The 200 terabytes of data the project currently stores would fill about 16 million file cabinets or 30,000 DVDs, making it difficult to share, Collins said during a March event.