Databricks’ Acquisition of Lilac Enhances Data Quality for GenAI Apps

Discover how Databricks’ acquisition of data quality tool developer Lilac will enhance the quality of future LLMs.

Mar 25, 2024
Channel Insider content and product recommendations are editorially independent. We may make money when you click on links to our partners. Learn More

Prominent data intelligence platform Databricks has announced the acquisition of Boston-based applied research startup Lilac, a developer of tools that enhance the quality of data used in developing large language models (LLMs). The integration of Lilac’s technology into the Databricks Data Intelligence Platform is a major step forward in the company’s mission to provide the industry with comprehensive generative AI solutions.

Rapid expansion through a robust series of acquisitions

Over the past year, the company has made several significant acquisitions to solidify its product portfolio. In February, it acquired Einblick, a pioneer in natural language processing. In May, it acquired Okera, a provider of data governance technology. In July, it acquired MosaicML, a leading generative AI platform known for its state-of-the-art MPT LLMs, and most recently in October, it acquired Arcion, a startup specializing in data replication.

News of its Lilac acquisition surfaced shortly after Databricks and NVIDIA announced an expanded alliance designed to deepen the technical integration between their respective technologies aimed at optimizing AI workloads on the Databricks platform.

Addressing AI developer challenges

Data lies at the heart of all LLM-based systems — whether in preparing datasets for training models, evaluating model outputs, or filtering retrieval-augmented generation (RAG) data. While exploring and understanding complex datasets is critical for building high-quality GenAI apps, the volume of tasks associated with the processes involved often presents time-to-value challenges.

Analyzing unstructured text data, for instance, can become highly cumbersome and inefficient. Historically, developers and data scientists have used labor-intensive, manual processes lacking scalability and rife with many other inefficiencies.

Lilac’s scalable, open-source architecture provides an efficient way to sidestep many of the challenges. With its intuitive user interface and AI-driven features, its tools empower users to analyze, understand, and modify unstructured text data at scale.

Researchers and data scientists can easily cluster and categorize documents, conduct semantic and keyword searches, detect personal information and duplicates, and make necessary edits to customize their datasets.

Empowering the channel with enhanced tools

The integration of Lilac’s tech stack into Databricks’ intelligence promises to simplify data tailoring processes and offer users improved capabilities to evaluate and monitor LLM outputs, as well as prepare datasets for RAG, fine-tuning, and pre-training.

Databricks executives have praised Lilac’s product offering for its proficiency in analyzing model outputs for bias or toxicity and in preparing data for LLMs.

By providing organizations with powerful tools for data intelligence and fostering synergies with industry leaders to drive innovation and efficiency in AI development, Databricks is positioning itself as an innovative provider of generative AI solutions.

To learn how you can leverage the benefits of AI in your organization and create an effective strategy for providing valuable business intelligence, read our article on building a foolproof AI strategy.

thumbnail Pamela Winikoff

Pamela Winikoff is an award-winning corporate communications and writing professional with extensive experience creating marketing, publicity, thought leadership, and other content that enhances public perception and accelerates business growth. She has also ghostwritten hundreds of articles for subject matter experts across numerous industries.

Recommended for you...

Caylent Research on Database Migrations: What to Know
Victoria Durgin
Aug 28, 2025
Exterro Debuts Agentic AI Tools for Data Risk and E-Discovery 
Jordan Smith
Aug 26, 2025
Multi-OEM Strategies & More Key to Infrastructure in AI Era
Victoria Durgin
Aug 26, 2025
Kendra Krause on New Role at ThreatDown & Channel Goals
Victoria Durgin
Aug 25, 2025
Channel Insider Logo

Channel Insider combines news and technology recommendations to keep channel partners, value-added resellers, IT solution providers, MSPs, and SaaS providers informed on the changing IT landscape. These resources provide product comparisons, in-depth analysis of vendors, and interviews with subject matter experts to provide vendors with critical information for their operations.

Property of TechnologyAdvice. © 2025 TechnologyAdvice. All Rights Reserved

Advertiser Disclosure: Some of the products that appear on this site are from companies from which TechnologyAdvice receives compensation. This compensation may impact how and where products appear on this site including, for example, the order in which they appear. TechnologyAdvice does not include all companies or all types of products available in the marketplace.