What’s powering the next wave of government AI

Angelica Zander/Getty Images
COMMENTARY | Open source technology is accelerating adoption by solving compute challenges and bringing AI closer to where it’s needed.
Recent federal initiatives, including the White House’s “AI Action Plan,” underscore the urgency of accelerating AI adoption across government. While the policy lays the foundation, open source technology is already overcoming the biggest barriers, offering scalable, efficient and more secure tools that agencies can deploy today.
Open source AI: Accelerating government adoption
A key aspect of this pillar is the call to “Encourage Open-Source and Open-Weight AI,” a philosophy that aligns with Red Hat's views on open source AI. The White House action plan states that “Open-source and open-weight AI models are made freely available by developers for anyone in the world to download and modify,” offering significant value to startups, businesses and governments by reducing their reliance on large cloud providers.
It notes that both businesses and governments often have “sensitive data that they cannot send to closed model vendors.” Open weight models reduce dependency on proprietary cloud services, allowing organizations to build AI workflows tailored to their specific missions. As the plan notes, open source and open weight models could become global standards. That future is already taking shape.
Compute: The AI barrier to entry
The plan recommends policy actions to address this barrier, including making compute resources more financially accessible to startups, partnering with leading technology companies to increase access to computing and models, and significantly supporting the National AI Research Resource pilot. These recommendations are vital, and the endorsement of integrated research and pilots between academia, industry and government is particularly encouraging. The White House accurately identifies that the high cost of compute and specialized accelerators prevents governments and startups from fully leveraging AI's power today.
Large Language Models consume substantial compute resources, both during their training phase and when they generate responses. Simply put, an LLM learns through mathematical calculations during training, and it makes probability calculations based on its prior learning when generating answers. Both phases are highly math- and compute-intensive, and when scaled to millions of prompts daily, the resource demand adds up fast. Sam Altman, CEO of OpenAI, has stated that training GPT-4 alone cost over $100 million.
An innovative open source solution
Still, the open source community isn’t waiting for policy to catch up. Rather than focusing solely on expanding infrastructure, these pioneers are working to make existing LLMs more efficient. For example, in June 2023, the Sky Computing Lab at UC Berkeley announced vLLM, an inference server designed to help LLMs perform calculations more efficiently at scale. In other words, vLLM speeds up generative AI output by optimizing GPU memory usage.
What's an inference server?
An inference server helps an LLM draw new conclusions based on its existing training. It allows the language model to extend its capabilities quickly without needing to "learn" new data. Think of it like inferring fire from smoke: You don't directly see the fire, but you can deduce its presence from the smoke. In essence, inference is the doing phase of AI.
The research from the Sky Computing Lab at UC Berkeley has yielded significant benefits. The vLLM implementation provides a high-throughput gen AI inference server that is broadly supported across cloud, models and hardware accelerators. It features multi-GPU support and batch processing, improving the efficiency of very expensive specialized hardware resources and leading to greater scalability and enhanced data privacy. These boosts lead to greater scalability and enhanced data privacy, which is especially important for government missions where data cannot be sent to closed model vendors.
With its significant contributions from academia and industry, vLLM provides a clear example of open innovation advancing performance, lowering costs and improving control over AI workflows.
Smaller models and edge computing
There's also a promising trend toward smaller AI models. Industry and academia increasingly recognize that the age of giant AI models is already behind us. Many in open source communities are focusing on developing smaller, faster AI models to replace the LLMs of the past few years.
These Small Language Models are proving to be more economical, flexible and capable, especially for agentic AI systems designed for specialized, repetitive tasks. This shift allows for more efficient and adaptable AI systems, often outperforming larger models in real-world deployments.
When these smaller models are combined with vLLM technology, they can run in previously unimaginable places — from laptops and cell phones to drones and smart devices at the edge. Technologies like vLLM will be crucial for bringing AI closer to where decisions are made without requiring massive data and compute resources to be sent to a centralized datacenter.
This approach has broad applications across government, including homeland security, forest firefighting, weather prediction and forecasting, and defense operations. Open source tools like vLLM and emerging SLMs will be key to making AI impactful where it's needed most.




