Prominent data intelligence platform Databricks has announced the acquisition of Boston-based applied research startup Lilac, a developer of tools that enhance the quality of data used in developing large language models (LLMs). The integration of Lilac’s technology into the Databricks Data Intelligence Platform is a major step forward in the company’s mission to provide the industry with comprehensive generative AI solutions.
Rapid expansion through a robust series of acquisitions
Over the past year, the company has made several significant acquisitions to solidify its product portfolio. In February, it acquired Einblick, a pioneer in natural language processing. In May, it acquired Okera, a provider of data governance technology. In July, it acquired MosaicML, a leading generative AI platform known for its state-of-the-art MPT LLMs, and most recently in October, it acquired Arcion, a startup specializing in data replication.
News of its Lilac acquisition surfaced shortly after Databricks and NVIDIA announced an expanded alliance designed to deepen the technical integration between their respective technologies aimed at optimizing AI workloads on the Databricks platform.
Addressing AI developer challenges
Data lies at the heart of all LLM-based systems — whether in preparing datasets for training models, evaluating model outputs, or filtering retrieval-augmented generation (RAG) data. While exploring and understanding complex datasets is critical for building high-quality GenAI apps, the volume of tasks associated with the processes involved often presents time-to-value challenges.
Analyzing unstructured text data, for instance, can become highly cumbersome and inefficient. Historically, developers and data scientists have used labor-intensive, manual processes lacking scalability and rife with many other inefficiencies.
Lilac’s scalable, open-source architecture provides an efficient way to sidestep many of the challenges. With its intuitive user interface and AI-driven features, its tools empower users to analyze, understand, and modify unstructured text data at scale.
Researchers and data scientists can easily cluster and categorize documents, conduct semantic and keyword searches, detect personal information and duplicates, and make necessary edits to customize their datasets.
Empowering the channel with enhanced tools
The integration of Lilac’s tech stack into Databricks’ intelligence promises to simplify data tailoring processes and offer users improved capabilities to evaluate and monitor LLM outputs, as well as prepare datasets for RAG, fine-tuning, and pre-training.
Databricks executives have praised Lilac’s product offering for its proficiency in analyzing model outputs for bias or toxicity and in preparing data for LLMs.
By providing organizations with powerful tools for data intelligence and fostering synergies with industry leaders to drive innovation and efficiency in AI development, Databricks is positioning itself as an innovative provider of generative AI solutions.
To learn how you can leverage the benefits of AI in your organization and create an effective strategy for providing valuable business intelligence, read our article on building a foolproof AI strategy.