MotherDuck’s cover photo
MotherDuck

MotherDuck

Data Infrastructure and Analytics

Data warehouse scaling in the cloud to make big data feel small using the speed, efficiency and ergonomics of DuckDB.

About us

Making analytics fun, frictionless and ducking awesome with a cloud data warehouse based on DuckDB's efficiency, ergonomics and performance in collaboration with the folks at DuckDB Labs.

Website
https://motherduck.com
Industry
Data Infrastructure and Analytics
Company size
11-50 employees
Headquarters
Seattle
Type
Privately Held
Founded
2022

Locations

Employees at MotherDuck

Updates

  • If you think setting up an OLAP cache requires a massive infrastructure overhaul, think again. Simon Späti breaks down exactly how to leverage DuckDB community extensions to dramatically accelerate your dashboards with almost zero setup cost. A perfect example of how lightweight tools can solve heavy data problems.

    OLAP isn't dead, right? 😉 Not to me, it hasn't been more alive than now. But what if OLAP itself isn't fast enough, or you use DuckDB and want to speed up caching, do you need a full-fledged OLAP system with re-investing all your analytics data? Maybe not. I went through OLAP caches for DuckDB in my recent deep dive and uncovered ways of simply adding two lines of code with community extensions: ``` SET GLOBAL cache_path = '/tmp/my_duckdb_cache.bin'; SET GLOBAL cache_enabled = true; ``` And you are good to go. Instant speed up with almost no setup cost. Have you tried it already? In this piece, I also elaborate on «The History of Caching BI Workloads» and the «Different Levels and Kinds of Caches» we have and are using today. Furthermore, I explore the typical obstacles when building a cache yourself. If that interests you, please read the attached essay with practical examples you can try out immediately. I hope you enjoy.

    • Simplicity of a Database, but the Speed of a Cache: OLAP Caches for DuckDB
  • View organization page for MotherDuck

    26,000 followers

    The hardest data problems aren't always "big data" problems. They are complex data problems. Lucie T., Data Engineer at Opto Investments, shared her thoughts on why private markets need a different kind of Modern Data Stack, highlighting four specific challenges that standard tools often miss: 🌀 Schema Chaos: No CUSIP-like standardization means every fund structure is different. ⏳ Temporal Complexity: You need to track when you knew something, not just what changed. 📖 Context: A number (like IRR) means nothing without the story behind it. ⚖️ Regulatory Speed: The need to move fast without breaking fiduciary duty. She also details why MotherDuck has become their new standard for solving this. By moving away from massive clusters and Spark jobs, they get the power of a cloud warehouse with the simplicity of a local database. This allows for much faster iteration. Really appreciate the mention! Check out the full post here:

  • Your boss calls you in. "Simple question," they say. "Is our business in good shape?" And you freeze. Because you know the data. You know that "good" depends on so many variables. It depends on what you mean by good. It depends on what kind of business you're referring to. What's an active user? We have conservative definitions and aggressive ones. Blah, blah, blah. So you give a 5 minute explanation full of footnotes. And your boss looks at you like you are crazy. This was the uncomfortable truth Benn Stancil laid out at Small Data SF. He pointed out that the modern data stack is built on a lot of faith. Companies hire expensive teams to look at esoteric numbers. And how do we know if any of it works? We hire more of them. They tell us. That faith is fragile. And it might be fracturing. Because if your stakeholder has to choose between a nuanced answer that takes weeks... or a directional "vibe" they get right now... they will take the vibe. They will choose the system that reads the support tickets and gives them a pulse check. Over the analyst who writes a giant paragraph explaining why the revenue numbers are technically complicated. Benn's not saying this is how it should be. He's saying this is what's happening. And the generation coming up behind us? They might not share our faith in quantification at all. Watch Benn's full talk from this year's Small Data SF here:

  • For years, doing geospatial analysis meant slow desktop software and complex infrastructure. We are excited to see Qiusheng Wu's new book, Spatial Data Management with DuckDB, clarifying the new blueprint for spatial analytics. It is not just a technical guide. It is a detailed look at how the industry is moving away from the "download and process" cycle toward high-performance, in-process SQL querying. The book explores how modern stacks can leverage out-of-core processing to handle massive datasets like global building footprints or national wetlands inventories without the traditional memory bottlenecks. This is a fantastic resource for anyone looking to modernize their geospatial workflows. Huge congratulations to Qiusheng on the launch! It is a massive contribution to the open-source and geospatial communities. Check out the book here: https://duckdb.gishub.org/

  • **Tomorrow** (Wednesday PST), Jacob Matson and Cody Peterson will do a Hands-On Lab on Agentic Data Engineering! Join us - RSVP using link in comments - for a 45-minute hands-on lab where you'll combine MotherDuck and Ascend to build a complete data pipeline, while leveraging AI agents that handle operational work for you. You'll experience first-hand how high-performing data teams are using these technologies to deliver trusted data faster and more efficiently. Together we'll: - build end-to-end data pipelines on MotherDuck - deploy agents to automate pipeline operations and workflows - implement best practices for deploying pipelines and agentic workflows in production By the end, you'll have a working pipeline and hands-on experience with both MotherDuck and Ascend.

    • No alternative text description for this image
  • View organization page for MotherDuck

    26,000 followers

    "We have great datasets." The data: 47 variations of "St. Albans" in one column. This post from Adam Sroka hit r/dataengineering hard because every DE has lived it (https://lnkd.in/grJH7ak9). And the frustration goes deeper than messy string, it's that nobody wants to invest in fixing it. Everyone wants clean data, but no one wants to pay for it. We asked four senior data engineers how they deal with it. The consensus? Stop trying to fix everything at once. 📋 Use the WAP technique (Write, Audit, Publish): Mehdi Ouazza advises writing data to a staging area and auditing it before publishing. Better to have no data than bad data. 🔄 Start small and iterate: Julien Hurault suggests shipping pipelines with basic tests and adding more as they fail. Avoid over-engineering from day one. 💬 Learn through exposure: Simon Späti notes that you have to see "really bad" data to understand what data quality even means. Talk to business experts to learn which data actually matters. 🎯Focus on key datasets: Benjamin Rogojan warns against alert fatigue - "don't boil the ocean." If you alert on everything, people ignore everything. Fix the critical data first. Data quality is just one of 10 top-upvoted r/dataengineering questions we tackled with our expert panel. Read the full blog here: https://lnkd.in/gBJg-wTX

    • No alternative text description for this image
  • Your pipeline just corrupted customer records. You need to roll back. But you can't just revert one table. Sales depends on customers, which depends on products, which changed two hours ago. This is what "Git for Data" promises to solve. But how do we efficiently fork data without waiting forever or spending a fortune? In part 1 of our deep dive with Simon Späti, we laid out a spectrum of data movement efficiency, ordered from least to most efficient: 1️⃣ Full 1:1 copying - Simple to understand but slow and expensive, especially at scale. 2️⃣ Delta-based changes - Only store what changed. Revert by pointing to the previous state. 3️⃣ Zero-copy virtualization - Share data between systems without serialization overhead. 4️⃣ Metadata/catalog-based versioning - Create logical versions with just pointer changes. No data duplication. The key insight behind #4: when you change one row in a million-row table, you only need to update that chunk. Everything else is shared between versions. Diffing scales with what changed, not total dataset size. Read the full blog:

  • We’re incredibly grateful for our flock of users, partners, and the whole data community. Whether you're feasting or coding today, we appreciate you being on this journey with us. Happy Thanksgiving from the MotherDuck team! 💛

    • No alternative text description for this image

Similar pages

Browse jobs

Funding

MotherDuck 3 total rounds

Last Round

Series B

US$ 52.5M

See more info on crunchbase