June 2025: $30M Series A, Multimodal Lakehouse Launch & Product Updates
LanceDB is Now a Series A Company
Over the past year, we have witnessed the Lance columnar format become the new standard for multimodal data. As of June 2025, Lance remains the fastest growing format across the data ecosystem. During this period, our open source packages have been downloaded more than 20 million times.
This milestone represents a significant validation of our vision to democratize multimodal AI development. The $30M Series A funding will accelerate our mission to build the most efficient and scalable data platform for AI applications. This investment will fuel our continued innovation in multimodal data processing, expand our enterprise offerings, and strengthen our global community of developers and data scientists.
Blog: Announcement from our Cofounder & CEO Chang She
Introducing the LanceDB Multimodal Lakehouse
As of June 24th, 2025, along with the celebration of LanceDB’s Series A, we are introducing the Multimodal Lakehouse Suite of Products into LanceDB Enterprise.
The LanceDB Enterprise offering now consists of four features: Search, Exploratory Data Analysis, Feature Engineering and Training.
The Multimodal Lakehouse represents a breakthrough in unified data management for AI applications, seamlessly handling text, images, audio, and video data in a single platform. This comprehensive solution eliminates the complexity of managing multiple data silos and provides enterprises with the tools they need to build, train, and deploy multimodal AI models at scale.
With built-in support for the latest AI frameworks and optimized performance for large-scale datasets, the Multimodal Lakehouse is designed to accelerate the development of next-generation AI applications.
Blog: What is the LanceDB Multimodal Lakehouse?
Product News: New Enterprise Features
Lance & LanceDB OSS Releases:
Events and Community Recap
From Text to Video: A Unified Multimodal Data Lake for Next-Generation AI
Ryan Vilim from Character AI shares how their Data & AI Platform team builds self-service tools and infrastructure to power LLM training and research. He explains how they leverage data lakes, Spark, Trino, Kubernetes, and Lance to prepare, annotate, and serve massive multimodal datasets—while keeping workflows fast and researcher-friendly.
The talk also covers why Lance’s open multimodal lakehouse format fits their needs, enabling unified storage, search, and analytics at scale. Packed with practical insights on managing AI data pipelines, this session is perfect for anyone building or scaling AI systems.
Building a Data Foundation for Multimodal Foundation Models
Ethan Rosenthal from Runway delivers an in-depth exploration of the unsung heroes behind generative models: data pipelines. Skipping over models and flashy applications, he dives into the nuts and bolts of handling massive, unstructured datasets—video, text, embeddings, and metadata—used to train and iterate on state-of-the-art generative video and image systems.
Drawing on his experience at Square and Runway, Ethan outlines the evolution from structured fraud-detection data to the complexities of multimodal AI, demonstrating why robust data infrastructure is the true backbone of model performance.
Throughout the talk, Ethan shares pragmatic lessons on system design—covering topics like columnable storage formats (moving beyond tarballs and Parquet), schema evolution with LanceDB, efficient video decoding, asynchronous data loading to eliminate GPU bottlenecks, and on-the-fly augmentation for flexible experimentation.
With candid insights and practical trade-offs, this session is essential for engineers and researchers building scalable, researcher-friendly pipelines for the next generation of generative AI.
Thank You to Our Valued Contributors:
A heartfelt thank you to our community contributors of Lance and LanceDB OSS projects this past month:
@renato2099, @wojiaodoubao, @Jay-ju, @b4l, @yanghua, @HaochengLIU, @ddupg, @bjurkovski @kilavvy@wojiaodoubao, @leaves12138, @majin1102, @leopardracer, @luohao, @KazuhitoT, @frankliee