KBase harnesses gene data for researchers

The new open software and data platform developed by scientists at DOE labs relieves research biologists from acting as programmers.

The Energy Department's national research laboratories are developing a large-scale computational platform to help research scientists crunch the massive amounts of data produced by new generations of genomics-based research technologies.

The Systems Biology Knowledgebase (KBase) is an open software and data platform that simplifies and streamlines database searches that previously took research biologists months to complete. With the new tool, researchers don't have to be programmers to answer big computational questions.

The KBase effort is led by scientists at DOE's Lawrence Berkeley, Argonne, Brookhaven and Oak Ridge national laboratories. The program also has partners at a number of leading universities, Cold Spring Harbor Laboratory, the Joint Genome Institute, the Environmental Molecular Sciences Laboratory and DOE's Bioenergy Research Centers.

Computational biologist Sergei Maslov, principal investigator for Brookhaven's role in the effort and associate chief science officer for the overall project, said the team is transitioning from KBase's scientific pilot phase to the production phase and will gradually expand from the limited functionality available now.

According to Brookhaven, the platform pulls available data on plants, microbes and other biological entities -- and some of their complex interactions -- into a centralized repository that is more easily accessible for researchers. The pooled data, computational tools and other shared resources will allow researchers to advance and propose new theories, predictions and other hypotheses for the organisms they're studying without having to learn how to write coded computer programs, according to lab officials.

Before the tool's development, biologists who wanted to determine relationships between separate information -- such as the likely way a particular gene variant might increase a plant's yield for producing biofuels -- had to pull data from several databases and cross-reference the information using complex computer code. The process could take months.

Now the labs are encouraging researchers to upload their data and programs to KBase so other users can mine them. The cooperative environment facilitates sharing and feedback among researchers so the programs, tools and annotation of datasets can improve with other users' input.

Ultimately, the project seeks to make complex analysis more efficient and less time-consuming, according to the lab.

To that end, lab officials said researchers who have coding experience can use KBase's IRIS Web-based terminal to control programs on their own. At the same time, researchers who are not skilled with coding can use KBase's Narrative Interface, which allows them to upload their data to precoded programs backed by a programmer who can help them interpret and filter the output.

"Quantitative approaches to biology were significantly developed during the last decade, and for the first time, we are now in a position to construct predictive models of biological organisms," Maslov said. "KBase allows research groups to share and analyze data generated by their project, put it into context with data generated by other groups, and ultimately come to a much better quantitative understanding of their results."