Trusted data is the heart of trustworthy AI

Bahadir Eroglu/Getty Images

COMMENTARY | AI systems are only as good as the data they are trained on -- that’s why a proper data management strategy is the key to success.

Amid the introduction of large-scale artificial intelligence implementation and initiatives across the federal government, NIST released a publication warning in January regarding privacy and security challenges arising from rapid AI deployment.

According to NIST, AI models are now facing several threats, including corrupted training data, security flaws, supply chain weaknesses and privacy breaches. 

Unsecure data used to train AI models is at the heart of these emerging threats, which leads to poor outcomes and results that decision makers can’t act on. With these risks, it's vital that federal agencies have a data strategy that protects and safeguards sensitive information. Moving forward, investment in trusted data will be vital for the progress of AI in the public sector. 

Security, governance, and trusted data

Trustworthiness and reliability of AI solutions requires a holistic approach that addresses the key areas of security and governance of attributable data. A breach or compromised AI systems can have severe consequences, impacting sensitive citizen data or even disrupting critical government services. 

Beyond prioritizing robust security mechanisms around AI solutions, AI systems must operate within a framework that promotes ethical practices, transparency, and accountability. The federal government is working on establishing clear guidelines and regulations surrounding the use of AI, ensuring that algorithms are fair, unbiased, and respectful of privacy rights, including the recent executive order on the Safe, Secure, and Trustworthy Development and Use of AI.

At the heart of these initiatives is the establishment of trustworthy and secure AI systems, but trusted data is at the heart of any trusted AI solution. AI systems are only as good as the data they are trained on – that’s why a proper data management strategy is the key to success. 

Creating a proper data management strategy

Functional AI relies on clean, secure data. Secure data can be accomplished by the use of open data lake houses, which facilitate data literacy and data-driven operations by enhancing trust in data through governance. 

Open data lakehouses are centralized repositories that allow for the storing and distribution of data.  They increase flexibility to expand AI and analytics while making data more accessible, providing self-service analytics, ensuring data quality and simplifying data security. Data lakehouses also provide end-to-end management and control capabilities throughout the data lifecycle.

AI requires detailed knowledge of the applicable data necessary to support complex analyses, and the government needs to know where the essential data is located and stored in order to move forward, whether it be on-premise, off-premise or in the cloud. Knowing where and how your data is stored is key to mission success.

Data is the foundation of any AI solution. In order to properly support agency missions, the vector databases and language models require trusted data that can be utilized appropriately. If the data can’t be trusted then AI can’t progress to provide trustworthy, actionable results. Amid rising security threats to AI models, it’s important for federal agencies to maintain data integrity, privacy, and compliance with regulatory requirements.

Open data initiatives and data lakehouses provide the government with the means to consolidate and securely manage their data assets. These approaches will ensure its availability for AI applications while maintaining privacy and compliance. Although agencies are receiving an abundance of recommendations and guidance around AI implementation, much of this can be simplified to the key principle of any trustworthy AI solution – and that is trusted data.

Editor's note: This article was updated on March 1, 2024.