Agencies Try a Tool That Digs Up What Google Doesn't

magnetix/Shutterstock.com

Parts of the government are using a deep web search engine that teaches itself to return better search results.

Gary Shiffman thinks his search technology is smarter than Google's.

That's the principle behind his Arlington-based company Giant Oak, which sells a deep web search engine product to federal agencies including the Immigration and Customs Enforcement. The search tool, Giant Oak Search Technology, or GOST, uses behavioral science to help its customers track down individuals—potential drug traffickers, money launders and other persons of interest—all while using machine learning to improve its search results.

GOST and other deep web products, including LexisNexis, scan parts of the internet a simple Google search doesn't return—sites not traditionally indexed by search engines. Giant Oak's product is also "domain specific," meaning depending on whether customers are searching for money launderers or drug traffickers, the tool would return different results.

» Get the best federal technology news and ideas delivered right to your inbox. Sign up here.

It works like this: Just as they might in Google, investigators type a name into GOST, which surfaces a series of results organized in order of what might be most relevant to those investigators' specialty. GOST is designed to look beyond keywords commonly associated with crime; subject matter experts have taught the algorithm to consider less-obvious factors that could signal a person's criminal history.

Instead of searching for a potential drug trafficker's name placed next to the word "cocaine," the algorithm might learn that traffickers tend to visit the same kinds of websites, or travel to the same kinds of places, even if those activities aren't directly relevant to their crime, Shiffman, Giant Oak's CEO and founder, explained.

After the query, investigators are asked to give their search results a thumbs up or down, so the algorithm learns what's relevant.

ICE's Homeland Security Investigations uses the technology to identify potential visa violators, a spokesperson told Nextgov. HSI scans hundreds of thousands of leads, trawling through government databases, social media sites and public indices, and tools including GOST can help "determine if an individual who overstayed has departed the United States, adjusted status, or would be appropriate for law enforcement action." Broadly, GOST helps "more quickly review public-facing websites when conducting research on these cases.”

Today, most search tools are trained to figure out "[d]oes this person show up on this list, does this person show up with the word 'cocaine' or 'terrorism' next to it," Shiffman said. "It’s very cool, but it’s very rudimentary. Especially if you’re interested in finding Keyser Soze [a character in The Usual Suspects] ... You're not going to find him on a matching list. ... The clever people are going to behave in ways that are understandable and predictable."

The 24-person company, based in Arlington, was founded in 2012 shortly after Shiffman was asked to work on a Defense Advanced Research Projects Agency program using big data to find foreign fighters in Afghanistan. That process was known as "quantitative counterinsurgency," he said.

Giant Oak, whose commercial customers are largely in the financial sector, sells another product to the Drug Enforcement Administration, which sifts through large sets of documents to surface the most relevant ones. For the future, Shiffman said the company is interested in technology that could allow investigators that could query the system using an image instead of a name.

Just as Google and Amazon are harvesting information about internet users to draw conclusions about their behavior, and use those conclusions to guess what products they'll like, "let's let analysts in the federal government use the technology ... the patterns of human behavior as reflected in open source data, and let's figure out who's in the domain of interest to me," Shiffman said.