Red Hat has rolled out Red Hat AI 3, the latest version of its enterprise AI platform. It’s built to help organizations move beyond experimentation and run AI workloads in production at scale. The release combines OpenShift AI, Enterprise Linux AI, and the AI Inference Server into one platform.
“As enterprises scale AI from experimentation to production, they face a new wave of complexity, cost and control challenges,” said Joe Fernandes, VP and GM of Red Hat’s AI business unit. “With Red Hat AI 3, we are providing an enterprise-grade, open source platform that minimizes these hurdles.”
At the core of Red Hat AI 3 is inference, the part of the AI process where models stop “training” and start doing the work. This stage consumes a lot of computing power and can be fairly unpredictable, which is why Red Hat is putting so much emphasis here.
Distributed inference and model-as-a-service
One of the biggest updates is llm-d, a system that spreads the work of running large language models across many servers. Built on Kubernetes and the open-source vLLM project, it helps organizations handle the heavy, unpredictable demands of AI without wasting expensive hardware.
“With llm-d, customers can adopt an intelligent AI platform that integrates seamlessly with Kubernetes,” said Steven Huels, VP of AI engineering, Red Hat. “Kubernetes scheduling helps maximize model performance and utilization of the underlying GPU hardware so they’re not sitting there idle.”
Alongside inference, Red Hat AI 3 introduced Model-as-a-Service (MaaS). This means IT teams can make models available as flexible, on-demand services within their own systems. It helps them keep costs in check, track usage, and stay compliant, while avoiding the risks of depending on outside AI providers.
Flexibility for agents and open standards
Red Hat AI 3 is also preparing for what comes next: agent-based AI systems. These are more complex, autonomous applications that will nudge inference demands even higher. To make that easier, the platform includes a Unified API layer built on Llama Stack. It is among the first to adopt the Model Context Protocol (MCP), which standardizes how AI models plug into outside tools.
“AI platforms aren’t going to run a single model on a single inference server on a single machine,” Fernandes said. “You’re going to have multiple models across multiple inference servers across a distributed environment.”
Built for collaboration and control
Beyond raw performance, Red Hat AI 3 is designed to unify the entire AI lifecycle. The platform includes:
- An AI hub for lifecycle management and governance
- A generative AI studio for experimenting with models and prototyping applications
- A catalog of tested, optimized models, including tools like Whisper for speech-to-text and Voxtral Mini for voice-driven agents.
By centralizing tools, workflows, and governance, Red Hat AI 3 gives platform engineers and AI developers a common foundation. The result is a more predictable, cost-effective way to operationalize AI across data centers, public clouds, and edge environments.
Red Hat’s push into distributed inference lands alongside moves from others in the ecosystem. Pure Storage just introduced updates with Azure, Portworx, and NVIDIA, including a Key Value Accelerator that boosts inference speeds up to 20x while cutting costs and energy use—another sign that AI platforms and infrastructure are evolving with efficiency front and center.





