Taking open source risks seriously

MicroStockHub/Getty Images

COMMENTARY | Software bills of materials don't address what tech leaders should actually do to make sure open source components are safe to use.

Cultural change across the public sector with respect to emerging digital threats has taken some interesting turns in the last few years. The prevalence of software supply chain concerns and the emergence of generative artificial intelligence has prompted many changes within the Department of Homeland Security. As open-source software has taken a bigger role across the government, we are seeing efforts to produce actionable guidance for implementing software supply chain security by the Cybersecurity and Infrastructure Security Agency and the National Telecommunications and Information Administration.

CISA is developing and maintaining lists of actively exploited vulnerabilities and NTIA is leading the charge in shaping some of the recent initiatives around software supply chain security, and a whole host of other efforts across a swath of agencies to produce actionable guidance for implementation.

While this is certainly a step in the right direction and represents a big cultural evolution across multiple agencies in managing security risks, some evolution is still required. Many software supply chain initiatives fall short of providing real value in implementation. Additionally, they’re still missing quite a bit of the current attack surface — this includes whole classes of overt, directed, software supply chain attacks (as opposed to just focusing in on vulnerabilities), and they are still behind the curve in ability to address some of the emerging threats expressed by the recent evolutions in the generative AI and ML fields.

In order to understand where the gaps exist, it’s important to first discuss what the primary efforts regulators and agencies have focused on to date look like. The centerpiece of much of the current landscape revolves around the concept of a Software Bill of Materials — a sort of “ingredients list” for software being consumed. What this essentially does is provide a catalog of sorts that outlines what is contained within a set of software deliverables, and a small amount of optional, amplifying context. 

The unfortunate thing about this approach, however, is that it essentially leaves actionability as an exercise left to the reader — what should an organization do with received SBOMs? How should it ensure that they are free of issues? If we add attestation and sign these software lists, how do we even know what they contain?

Current SBOM standards lack the required fields necessary to ensure they are useful to all end consumers — let alone actionable. The standards also, at best, provides very surface-level insights into the software libraries being used, and even more advanced frameworks that put additional focus on the concept of provenance — such as slsa.dev — lack the granularity to identify emerging threats and modern software supply chain attacks, such as build tool compromises, account compromises  and other sorts of issues. (Some of these risks are captured in the Secure Supply Chain Consumption Framework from Microsoft and the Open Source Software Foundation.)

These emerging threats can only be countered by more advanced capabilities that take a much more holistic view of the software being leveraged, and apply more conventional supply chain risk management concepts to the process of software development.

From this lens, the addition of generative AI/ML makes this problem worse, as they enable attackers to automate their efforts to a greater degree than previously possible — especially given that in many cases, the payload targeting a developer or build infrastructure may not even require the full library to be functional in order to detonate. What this effectively means is that attackers can publish malicious libraries with a diversified payload in an automated fashion, ensuring they are able to get full coverage of whatever library they happen to be targeting, and making efforts to manage against them even more difficult. 

Additionally, generative AI itself creates new classes of issues with respect to generated source code used in sensitive settings — as what it produces is effectively an amalgam of the software it was trained on, and may produce hallucinations — plausible outputs containing objectively fake data.

So how should regulators think about addressing these concerns? One of the biggest challenges to date is helping organizations now responsible for receiving SBOMs to operationalize them, and understand what they actually mean. Additionally, more focus on provenance and functionality is absolutely key — as this gives insight into the behaviors and privileges that should be expected from software as it is being used.