An interagency collaboration within the Department of Homeland Security is tapping machine learning capabilities to halt international child abuse cases.
The Department of Homeland Security’s lead criminal investigative office is exploring how artificial intelligence and machine learning technologies can help law enforcement discover, stop and prosecute child sexual abuse cases at a faster rate.
Two systems jointly developed by the agency’s Homeland Security Investigations department and Science and Technology Directorate are being upgraded with new technology to aid in the agency’s undercover operation Corregidor, where HSI special agents infiltrating online chat groups where human traffickers sell explicit digital material from around the world.
“What brought us to S&T was HSI’s need to reconcile huge amounts of data which would come in our investigative process,” Will Crogan, the assistant special agent in charge of HSI’s North Divisio in New England, told Nextgov/FCW.
To analyze and consolidate this data in a shorter amount of time, HSI and S&T jointly developed their StreamView and SpeechView tools in 2022. After determining both programs’ utility in analyzing the data flow that comes in through Corregidor operations, the DHS offices are planning to advance the emerging systems powering both softwares.
“We can basically go from two weeks to identifying a criminal customer that's driving this abuse of a small child to basically before lunchtime, basically half of a work day,” Crogan said.
Shane Cullen, S&T’s forensics and criminal investigations program manager helming the development and deployment of both systems, said that his officer is looking to outfit SpeechView in particular with new software to advance what demographic characteristics it can gauge from a speaker’s voice found in some of the explicit audio files HSI receives.
“It's a very advanced language investigation tool,” Cullen told Nextgov/FCW. “We've got some way-out efforts looking at the literal physical size of speakers.”
SpeechView can currently evaluate the gender, sentiment, origin and translation of a given speaker with its current algorithmic capabilities. Cullen said that S&T is aiming to upgrade and improve how it evaluates speaker traits in addition to creating a search function for law enforcement to better wade through the vast amount of audio data in child exploitation cases.
“Imagine a laptop full of data or a return with raw files from a livestream event,” Cullen said. “You might have hundreds of hours of audio and you want to look for a baby that's been abused. So you can give a command line…saying ‘hey, I want to look for female, small person in distress.’”
Artificial intelligence and other machine learning systems are also in further development within S&T. Cullen said that for still, pixelated images, AI and ML tech can help group similar pictures and learn how to identify which images likely showcase abuse.
“At 10,000 pictures, I'm looking for those six pictures where a child might be exploited, and we can train that tool to recognize the pixel presentation that's consistent with child exploitation,” he said. “And now you're down to those six pictures that you're interested in, you don't have to look at 10,000 extraneous images.”
These forthcoming AI capabilities will ideally expand into identifying the geography of an image where abuse may be occurring. Cullen said that using machine learning algorithms can help give investigators broader context into the location of the image, ranging from a continent to a city depending on the data.
“We get images without context of a child being abused in a room, say, and we don't know anything else about that encounter other than this image. And so we're working on capabilities that will assess indoor rooms to find out where that abuse might have taken place in a geographic sense,” he said.
Cullen says that SpeechView should be available by early 2025. StreamView has already been implemented in Corregidor. He also noted that, in keeping with the Biden administration’s focus on right-preserving AI technologies, HSI and S&T submit their system prototypes for a “rigorous legal review” to ensure that the privacy of minor victims is consistent with current law.
To train SpeechView and StreamView, Cullen said that S&T relies on premade, public datasets from universities or other government agencies.
“We're trying to put a fine tune on by automatically identifying that critical data for our end users,” he said.
Beyond expediting the speed at which HSI investigators can police child abuse cases, Crogan added that these systems help parse out relevant data to build a catalog of victims for rescue operations while ensuring DHS officers are not subjected to excessively traumatizing content.
“Law enforcement is increasingly concerned about mental health, well-being in the veins of casual trauma, and that's a big concern for folks that work in this realm,” Crogan said.
Editor's note: This article has been updated to reflect Will Crogan's position.
NEXT STORY: NSA to stand up AI security center