Gimlet Labs Targets AI’s Inference Cost Problem

Gimlet Labs raises $80M in a Menlo Ventures-led round to tackle costly AI inference, using flexible, multi-chip systems to improve performance at scale.

Mar 25, 2026
Channel Insider content and product recommendations are editorially independent. We may make money when you click on links to our partners. Learn More

Gimlet Labs is going after a part of AI that isn’t exactly a household name, but shows up quickly in production. The word of the day is inference.

Series A funding round by Menlo Ventures targets inference problems with AI deployment

The startup raised $80 million in a Series A round led by Menlo Ventures, digging into a part of AI that tends to get overlooked. The model might be trained, but running it without hitches at scale is where many teams hit a wall.

Most of the attention still goes to training; bigger models, more power, more compute. But once those models are out in the world, inference is what actually runs all day. Every prompt, API call, and workflow runs through that layer, and if it’s not efficient, you feel it pretty quickly in the wallet.

Advertisement

Gimlet is betting on multiple chips and flexible systems as GPU constraints worsen

Gimlet is trying to address that by spreading inference across multiple chips rather than relying on tightly coupled hardware. It lines up more with what most teams are actually working with, which is usually a mix of different systems pieced together over time.

That flexibility is part of the pitch.

“We’ve entered a fundamentally new era of computing where the speed of intelligence has become the critical bottleneck,” said Zain Asgar, co-founder and CEO, Gimlet Labs. 

“In order to unlock the next 10-100X performance increases needed in use cases like coding agents, we’ve identified how to leverage heterogeneous hardware for faster, more efficient inference. At Gimlet, we’re seeing this approach deliver an order of magnitude better performance per watt for our customers which is critical for anyone operating at scale given today’s datacenter capacity bottlenecks.”

Avoiding specialized hardware carries significant weight right now. 

GPU constraints haven’t let up, and even when capacity is available, it’s still pretty pricey. Software that can extract more performance from existing infrastructure is getting a closer look.

Advertisement

Why inference is such an expensive component of AI deployment

Inference has quietly become one of the most expensive layers in AI deployment. It’s also one of the hardest to ignore, since it runs continuously.

That is part of why startups like Gimlet are emerging from the woodwork. 

A different kind of scaling problem

What Gimlet is trying to address now is less about raw compute and more about coordination. Getting multiple chips to work together efficiently, especially across distributed systems, has been a sticking point. That gets harder in environments that weren’t built to work this way.

“Heterogeneity is inevitable, and Gimlet Labs is ahead of it,” said Tim Tully, partner, Menlo Ventures. 

“Most infrastructure was built for a homogeneous world — and the industry is paying hundreds of billions in CapEx for it. Gimlet built the only infrastructure designed from the ground up to embrace heterogeneity, purpose-built for agentic AI at scale. The research pedigree and deployment experience this team brings is unmatched.”

Investors appear to see that as a meaningful opportunity. The funding signals that inference is moving from the background to the foreground for many organizations. 

For teams deploying AI in production, this is where the conversation tends to land. Models matter. But how they run, how fast they respond, and how much they cost to operate tend to matter more once they are actually in use.

And that is exactly where Gimlet is trying to sit.

Dell, NVIDIA, and Elastic have been building infrastructure to support the full AI lifecycle, including real-time inference and retrieval across massive datasets. It’s the same pressure Gimlet is going after from a different angle. Once AI moves into production, the bottleneck isn’t just the model; it’s how fast and efficiently everything runs behind the scenes.

Allison Francis

Allison is a contributing writer for Channel Insider, specializing in news for IT service providers. She has crafted diverse marketing, public relations, and online content for top B2B and B2C organizations through various roles. Allison has extensive experience with small to midsized B2B and channel companies, focusing on brand-building, content and education strategy, and community engagement. With over a decade in the industry, she brings deep insights and expertise to her work. In her personal life, Allison enjoys hiking, photography, and traveling to the far-flung places of the world.

Recommended for you...

NVIDIA GTC Recap: Updates From the Next-Gen AI Conference
Jordan Smith
Mar 23, 2026
OpenAI Tries To Untangle Its Own Product Line with ‘Sperapp’
Allison Francis
Mar 23, 2026
Databricks, Accenture Double Down On Enterprise AI Buildout
Allison Francis
Mar 18, 2026
Anthropic Launches Claude Partner Network with $100M Fund
Allison Francis
Mar 17, 2026
Channel Insider Logo

Channel Insider combines news and technology recommendations to keep channel partners, value-added resellers, IT solution providers, MSPs, and SaaS providers informed on the changing IT landscape. These resources provide product comparisons, in-depth analysis of vendors, and interviews with subject matter experts to provide vendors with critical information for their operations.

Property of TechnologyAdvice. © 2026 TechnologyAdvice. All Rights Reserved

Advertiser Disclosure: Some of the products that appear on this site are from companies from which TechnologyAdvice receives compensation. This compensation may impact how and where products appear on this site including, for example, the order in which they appear. TechnologyAdvice does not include all companies or all types of products available in the marketplace.