Article

Mar 26, 2026

Snowflake Stores Your Data. It Doesn’t Understand Your History.

Most teams that rely on Snowflake believe they already have their data problem solved. If you walk into a modern company—especially anything even slightly data-driven—you’ll find a familiar stack: product events flowing through Segment or RudderStack, landing in S3, modeled through dbt, and ultimately queried in Snowflake. Dashboards sit on top in tools like Tableau or Looker. Analysts write SQL, product managers check funnels, and leadership reviews weekly metrics...

From the outside, it looks like a complete system. Data is centralized, structured, and queryable. The company feels “data-driven.”

But if you zoom in on how decisions actually get made—and more importantly, how AI systems get trained—you start to see the gap.

Snowflake doesn’t store how the company actually evolved. It stores a cleaned, modeled version of what someone decided was important to track.

And that distinction matters more than most teams realize.

The Problem: You Only Keep the Final Tables, Not the Journey

Take a typical SaaS company.

Your Snowflake warehouse has:

  • A users table

  • An events table

  • A few derived models like daily_active_users or conversion_funnel

Maybe you’ve layered in some reverse ETL, syncing key metrics back into Salesforce or HubSpot. Maybe you’re running experiments and logging results into separate tables.

But now consider everything that isn’t in Snowflake:

  • Raw application logs before they were filtered

  • Support tickets and their full resolution threads

  • Historical versions of your database before schema changes

  • Internal Slack discussions about why a feature was shipped or rolled back

  • Failed experiments that never made it into a dashboard

  • Edge cases, bugs, and one-off incidents that were “fixed” and forgotten

All of that data exists. It just doesn’t live in Snowflake.

It lives in backups.

And today, backups are treated as something you only touch when something breaks.

That’s the structural issue.

Snowflake gives you a snapshot of the business as it is modeled today. Backups contain the full timeline of how the business actually behaved over time.

Why This Breaks Down for AI

This gap didn’t matter as much when the goal was dashboards.

If all you need is “What was conversion last week?” then a clean, modeled table is exactly what you want.

But AI systems don’t learn from summaries. They learn from sequences, context, and variation.

If you try to build anything more ambitious than a dashboard—say:

  • A model that predicts churn based on real user behavior

  • An agent that understands how support issues were actually resolved

  • A system that simulates how product changes affected retention over time

—you quickly run into a wall.

The data in Snowflake is too curated.

It’s missing the messy, high-signal details that actually explain why things happened.

So teams compensate in predictable ways:

  • They buy external datasets to fill in gaps

  • They spin up new pipelines to capture “better” logs going forward

  • They rebuild context manually, stitching together data from multiple systems

And despite all of that effort, they still can’t access the one dataset that is both complete and proprietary: their own history.

A Concrete Example

Imagine you’re a product lead trying to reduce churn.

Today, your workflow probably looks like this:

  1. Query Snowflake for churned users

  2. Join with product usage tables

  3. Build a model using a subset of tracked events

  4. Layer in some support ticket metadata if you have it structured

What you end up with is a model trained on a simplified version of reality.

Now compare that to what actually exists inside your backups:

  • Every raw event before it was filtered

  • Full support conversations, not just tags

  • Historical states of user accounts

  • Logs showing exactly where users got stuck or errored

  • Internal notes about why certain fixes were deployed

That dataset is not just larger. It’s qualitatively different.

It captures behavior, context, and decision-making—not just outcomes.

The problem is that it’s locked in a system designed for recovery, not analysis.

The Shift: From Warehouse-Centric to History-Centric

This is where the mental model needs to change.

Snowflake is optimized for structured querying. It is incredibly good at answering questions about the present state of your business.

But it was never designed to be a system of record for how your company actually evolved over time.

Backups already are.

Duplicati starts from that premise: the most valuable training data a company owns is not what it has modeled, but what it has retained.

Instead of treating backups as cold storage, it turns them into a continuously usable data layer.

Concretely, that looks like:

  • Extracting raw backup data into formats like Parquet or Delta Lake

  • Preserving time-based versions so you can reconstruct past states

  • Making that data queryable alongside Snowflake, not separate from it

  • Feeding it into tools teams already use, like MLflow or internal Python workflows

The result is not a replacement for Snowflake, but a second layer that complements it.

Snowflake tells you what your business looks like today.

Duplicati lets you train on how it actually behaved over time.

What Changes in Practice

Going back to the churn example, the workflow becomes meaningfully different.

Instead of starting from a curated table, you can:

  1. Pull full historical user timelines directly from backup-derived datasets

  2. Incorporate raw support interactions and resolution paths

  3. Reconstruct the exact product state at the time of churn

  4. Train models on sequences of behavior, not just aggregated features

This is the difference between predicting churn based on simplified signals and understanding it based on actual lived behavior.

The same pattern shows up across functions:

  • Product teams can simulate how past launches affected retention

  • Support teams can train systems on real resolution strategies

  • Growth teams can analyze how experiments actually unfolded, including failures

None of this requires replacing Snowflake.

It requires acknowledging that Snowflake only captures part of the picture.

The Real Replacement

The common instinct is to think this competes with Snowflake directly.

It doesn’t.

What it replaces is the growing layer of:

  • One-off pipelines built to recover historical data

  • External datasets purchased to compensate for missing context

  • Engineering time spent reconstructing what already exists internally

Those are symptoms of the same underlying problem: your most complete dataset is sitting in backups, but your systems aren’t designed to use it.

Duplicati changes that by making backups usable as infrastructure, not just insurance.

The Thesis

Every company already has the dataset it needs to build meaningful AI systems.

It’s just not stored where they’re looking.

Snowflake organizes what you’ve decided to measure.

Backups capture everything that actually happened.

The companies that win won’t just have better models. They’ll have access to their own history, structured in a way that makes it usable.

That’s the shift from data warehousing to training data infrastructure.

Get started for free

Pick your own backend and store encrypted backups of your files anywhere online or offline. For MacOS, Windows and Linux.

Pick your own backend and store encrypted backups of your files anywhere online or offline. For MacOS, Windows and Linux.

  • Example image