US Spies Train Computers to Spot Suspicious Activity in Live Videos

Andrey VP/

A research project will attempt to automatically detect suspicious activities, with the help of live video pouring in through multiple camera feeds.

Story has been updated with comments from an ODNI official. 

The intelligence community is working on amping up people-recognition power to spot, in live videos, shooters and potential terrorists before they have a chance to attack.

Part of the problem with current video surveillance techniques is the difficulty of recognizing objects and people, simultaneously, in real-time.

But Deep Intermodal Video Analytics, or DIVA, a research project out of the Office of the Director of National Intelligence, will attempt to automatically detect suspicious activities, with the help of live video pouring in through multiple camera feeds. 

ODNI’s Intelligence Advanced Research Projects Agency is gathering academics and private sector experts for a July 12 "Proposers’ Day," in anticipation of releasing a work solicitation.

“The DIVA program will produce a common framework and software prototype for activity detection, person/object detection and recognition across a multicamera network,” IARPA officials said in a synopsis of the project published June 3. “The impact will be the development of tools for forensic analysis, as well as real-time alerting for user-defined threat scenarios.”

» Get the best federal technology news and ideas delivered right to your inbox. Sign up here.

In other words, the tech would scour incoming video surveillance and body-camera imagery from areas of interest for people and objects who could present a threat, or individuals and items that might have been involved in a past crime. 

This is the type of video-recognition system that might have been used for identifying would-be suicide bombers before the Paris and Brussels attacks, some video analytics experts say.

Privacy laws in the United States and Europe differ, so it's tricky to know whether such activity-recognition software would have been legal to use on video around the time of the 2013 Boston Marathon bombings.

On Tuesday, an ODNI official said on background the regulations governing video capture in open areas, stateside, are complicated.

"The legalities of how video recorded from public spaces is used is complex, depends on both federal and local statutes, and may change over time," he said. "Technology from DIVA should be used for legal purposes, and in these cases, it is intended to enhance national security and public safety."

The official added that IARPA is not an authority on how Boston or other jurisdictions may use video recorded in public spaces. The new program's goals are "aiding a timely review of video footage after an attack, or by detecting evidence of a planned attack, and tipping further investigation," he said.

What Is Sticking out of That Spectator’s Backpack?

The envisioned system will provide multiple levels of granularity, according to IARPA. 

One perspective would flag "primitive activity," like people getting into or out of a car, or someone carrying an object, the synopsis states. In this experimental scenario, the video would be collected from security cameras. 

Another feature would key in on complex activities: someone carrying a firearm or two people exchanging an object, according to IARPA.

The most sophisticated capability would recognize people and things in live footage from many angles, showing different perspectives of areas of interest.

This pinnacle of the program involves “person and object detection and recognition across multiple overlapping and non-overlapping camera viewpoints," IARPA officials said. 

These last two experiments will take advantage of body-cam video feeds and handheld video camera images, the synopsis states. Some of the sensors also might capture infrared data and video from other portions of the electromagnetic spectrum not visible to the human eye. 

Participating teams are expected to consist of experts from many technical disciplines, including artificial intelligence, probability, person re-identification, and 3-D reconstruction from video.

The intelligence community anticipates academic institutions and private sector companies from around the world to join in. 

Correctly identifying individuals before they can attack requires a system with not only keen recognition but also lots of video, data profiles and pictures of faces.

The technology must have access to a bad-guy database that already contains the would-be perpetrator’s face, cameras that can capture usable images of people approaching an area, and a way to signal guards or otherwise cut off access to the target, Defense One reported shortly after 32 people were killed in bombings on the Brussels metro and at the city's international airport March 22.

A failure to stop the same extremists behind last November's deadly terrorist attacks in Paris from carrying out the Brussels bloodshed, highlighted inadequate government intelligence on terrorists and communities’ lack of trust in police.

Tagging Faces and Things in A Crowd

Last year, a facial recognition system was used on live video from surveillance cameras at the European Games, in Baku, Azerbaijan, according to the tool's developer. During the June 2015 event, organizers watched a webpage that could issue an alert if a face in the crowd matched that of an individual on a watch list, explained John Waugaman, president of Tygart Technology, the company that deployed the technology. 

A match that scores above a certain level of confidence will generate an alert.

Sometimes, the vast amount of faces in a highly-populated area can bog down the scanning process, so high-performance computers are available for support on the back-end, said Waugaman, whose customers include the U.S. intelligence community, Pentagon and law enforcement agencies. 

If an agency needs to screen more faces per minute, for example, it can grab more computing bandwidth from cloud computer providers. 

The IARPA research will focus on creating a scalable framework that can function in an open cloud environment, the synopsis states.

The scope of the intelligence project -- the intertwining of real-time person identification, object recognition and activity detection -- is the next wave of video surveillance, Waugaman said. 

"Easily within the next two years, you’ll see pairing of facial and object recognition in operational use," he said. 

With "accurate object detection capabilities, you can broaden the use cases from known subjects to just people that are behaving oddly," Waugaman said. Instead of the software program "being trained to find faces, it’s trained to find people with backpacks or it’s training to find people carrying guns.”

There could be backlash surrounding the use of activity recognition software from privacy or gun rights groups, Waugaman acknowledged.

But perhaps paradoxically, combining all the identification modes could cut the number of false alarms, also called “false positives.”

“We might know that the person is a threat," Waugaman said. "They have an object that might be a threat and they are acting in a manner that appears to be threatening. That will really help reduce the false positive rates in these systems."