Published: Last updated:

Data Architecture: Medallion and Data Mesh

In a data-driven economy, the architecture of data pipelines is decisive for the speed at which insights can be generated. We rely on structured concepts like the Medallion Architecture (Bronze, Silver, Gold) and decentralised approaches like Data Mesh.

The goal is to transform data from unstructured raw data into high-quality, business-critical information products that are usable for both classical analytics and Generative AI (RAG).

Anti-Patterns: Der Data Swamp

Without a clear architecture, data initiatives often end up in a Data Swamp: vast amounts of unstructured, partially faulty data sit in a Data Lake with no clarity on which version is correct. This leads to incorrect analysis results, high storage costs, and prevents the effective use of AI — since the model is based on bad data (Garbage in, Garbage out).

Structured Data Refinement

  1. Medallion Architecture:
    • Bronze (Raw): Storage of raw data 1:1, exactly as it comes from the source system. Historisation is key at this stage.
    • Silver (Validated): Cleansed, filtered, and standardised data. The foundation for cross-team analysis.
    • Gold (Enriched): Highly aggregated data optimised for specific business questions (Data Products).
  2. Data Mesh Principles: Data responsibility sits where the data originates (Domain Ownership). Data is offered as a product (Data as a Product) via standardised interfaces.
  3. Data Lakehouse: Combination of the flexibility of a Data Lake with the structure and performance of a Data Warehouse.
  4. Schema Enforcement: Technical enforcement of data structures in early stages to guarantee quality in the Silver and Gold layers.
  5. KI-Ready Layers: Dedicated provision of vector data (Embeddings) for RAG applications based on the Gold data.

The Advantage: Single Source of Truth

Through structured refinement, every user (human or AI) knows exactly which data layer they can trust and what quality to expect there.

FAQ

Isn't Data Mesh too complex for a mid-sized company?

You don't have to introduce all principles at once. The most important takeaway from Data Mesh for SMEs is Domain Ownership: the specialist department must be responsible for the quality of their data — not IT.

Why do we store the raw data (Bronze) separately if we clean it anyway?

To be able to look back into the past at any time. If the cleansing logic changes, we can completely rebuild the Silver layer from the unchanged Bronze data at any point.

Reference Guide

  • Medallion Architecture (Databricks): The standard for Lakehouse architectures. databricks.com
  • Data Mesh (Zhamak Dehghani): The foundational concept for decentralisation. martinfowler.com
  • The Data Warehouse Toolkit: Ralph Kimball on data modelling. kimballgroup.com

Related Topics

Open Items