DHS updates on data mining

Analytical tools support efforts at Department of Homeland Security to detect terrorist plots, money laundering, illegal trade schemes and more.

Shutterstock image (by Andrii_M): computer binary code.
 

The Department of Homeland Security uses software tools to extract insights from its vast troves of data. Under federal law, DHS must make an annual report to Congress on its use of data mining to allay concerns about possible privacy violations.

The latest report, released publicly April 20, said that "no decisions about individuals are made based solely on data mining results" and that DHS investigators "apply their own judgment and expertise to bear in making determinations about individuals initially identified through data mining activities."

The data mining report, the associated privacy reports and record system notices together provide updates on how DHS is integrating its data systems across all its component agencies and its progress on its strategy to create a centralized "data lake" for investigators.

As of October 2016, DHS had wrangled 17 datasets into the DHS Data Framework. These include some of the large travel and immigration databases, including the I-94 system for foreign visitors, the Electronic System for Travel Authorization and the Passenger Name Record system.

The Framework is divided into two related systems --  a data lake called Neptune and a classified query system called Cerberus, which is used for counterterrorism probes. In 2016, according to the report, DHS tapped Cerberus to "facilitate bulk information sharing with U.S. government partners." In this context, "bulk" refers to data that isn't selected based on specific identifiers or other search terms "reasonably likely to exclude any intelligence or information not relevant to the need giving rise to the recipient's request."

DHS also noted that it was looking to replace an interim solution that allows users of the Framework to make classified queries to identify terror suspects linked to ISIS, al-Qaida and their affiliates to address the risk of "foreign fighters" entering the U.S.  According to the report, DHS "defined a set of operational requirements that the Data Framework must meet in order to fully replace the interim process."

A key goal of the Framework was to apply the "One DHS" policy to integrate and manage data across all sources. However, familiar issues of interoperability hamper the integration of systems. One planned feature -- keeping the data in the Framework coordinated with the data in the source systems -- had to be postponed. DHS "discovered that the source IT systems are not always able to accommodate" delete notifications from source systems, " due to a number of constraints, such as resources, legacy systems, and disruptions to operational support."

Accordingly, according to the report, an update to the data retention policies of the Framework will be addressed in a forthcoming privacy assessment.

The report also identified two new data mining systems. The Socrates pilot, administered by Customs and Border Protection, and the Fraud Detection and National Security Data System under the control of the U.S. Citizenship and Immigration Service. The Socrates pilot is being operated in conjunction with the Johns Hopkins University Applied Physics Laboratory and involves analyzing large international trade datasets to identify patterns of tariff avoidance, importation of counterfeit merchandise and other illicit trade activity. The longstanding Fraud Detection and National Security Data System, which tracks fraud in immigration applications, has added analytical capacity.