Published: 11. Mar 2026 Last updated: 25. Mar 2026

GenAI and RAG

Generative AI (LLMs) only realise their full value in an organisation when they have access to internal knowledge. Retrieval-Augmented Generation (RAG) is the architecture that connects a language model to company-specific documents — without those documents being used to train the model.

This enables precise answers based on current facts while simultaneously guaranteeing compliance with data privacy requirements and the protection of business secrets.

Anti-Patterns: Hallucinations and Data Leakage

Public AI models (such as ChatGPT) tend to fabricate facts (hallucinations) when they don't know an answer. Beyond that, feeding sensitive company data directly into public clouds is often prohibited from a compliance perspective. Pure Fine-tuning of models is too slow and too expensive for dynamic company data.

The RAG Workflow

Ingestion: Documents (PDFs, wikis, code) are automatically ingested, broken into small pieces (Chunks), and converted into vectors (Embeddings).
Retrieval: When a user submits a query, the system searches a Vector DB at lightning speed for the text passages that are semantically most relevant to the question.
Augmentation: The retrieved facts are packaged together with the user's question into a Prompt and sent to the language model.
Generation: The model generates an answer that is based exclusively on the supplied facts and cites them as sources.
Sovereign Infrastructure: The entire system (Vector DB and LLM) is operated on Swiss infrastructure or on-premises to ensure maximum data security.

The Advantage: Verifiable Knowledge

RAG systems cite their sources. You can verify at any time which page in which document the information comes from — this builds trust and eliminates hallucinations.

FAQ

Is RAG better than training the model itself?

Yes, for 99% of use cases. RAG is cheaper, faster to update (seconds rather than weeks), and delivers far higher reliability through source attribution.

Will our data become visible to other OpenAI users through RAG?

Not with a sovereign architecture. We either use Enterprise APIs with opt-out from training, or we run Open Source models (such as Llama or Mistral) entirely within our own cloud environment.

Reference Guide

Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks: The foundational research paper by Facebook AI Research. arXiv
LangChain / LlamaIndex: Frameworks for building RAG applications. langchain.com
Open Source LLMs (Hugging Face): Platform for Open Source models. huggingface.co

Open Items

Add benchmark comparison of Open Source LLMs (Llama 3 vs. Mistral vs. Claude).
Link guide for "Evaluation & Testing of RAG Systems".