Sovereign AI
Sovereign AI: the model comes to the data, not the data to the model
Sovereign AI means running open models on in-house infrastructure in Switzerland or on premises, instead of sending data to a foreign cloud. It is the architecture that makes AI and data sovereignty compatible.
Many hosted AI services send every input to an API of an external provider. For a harmless question that does not matter. For a draft contract, a patient record or a fiduciary's client data, it is a data outflow that cannot be undone. Sovereign AI reverses the direction: the model comes to the data, not the data to the model. This page describes the problem this architecture solves, how it is built, what it costs, and where its honest limits lie. It is the conceptual anchor of the AI cluster; the other pages, from GenAI and RAG to the individual tools, play out the same thread from their respective angle.
The sovereignty problem
Anyone using a US provider's AI API takes on three dependencies at once, often without naming them:
- Data. Every request leaves the building. Where it flows, how long it is stored and whether it feeds training is set by the provider's terms, not the organisation's own.
- Jurisdiction. When data sits with a US provider, the US Cloud Act applies regardless of the server's physical location. A data centre in Zurich owned by a US corporation does not reliably protect against it. Where the system processes personal data, the revised Swiss Data Protection Act applies in parallel.
- Vendor lock-in. The model, the interface and the prices belong to the provider. A retired model version, a price increase or a changed usage agreement hit directly, with no fallback path.
These three points are the core of digital sovereignty applied to AI. They are not an argument against AI but one for a deliberate architectural decision about where the model and the data sit.
The self-hosting architecture
Sovereign AI rests on two building blocks: a model whose weights are openly available, and infrastructure that stays under the organisation's own control.
An open-weights model is one whose trained weights may be downloaded and run locally, often, though not always, under an open licence such as Apache 2.0; the licence must be checked per model. That lets it run on in-house hardware without any request ever leaving the network. The European provider camp supplies a number of open models under Apache 2.0 (see Mistral); this matters because an open licence and EU jurisdiction address two different sovereignty questions at the same time.
On the operations side, an inference layer serves the model as an API inside the organisation's own network. An open-source inference engine such as vLLM handles efficient delivery to many concurrent requests; it is the sovereign counterpart to the cloud API. In front of it sits an interface such as LibreChat, which offers the same access as a commercial chat front end, only against the in-house model. Where knowledge from internal documents is needed, GenAI and RAG is added, whose vector store likewise stays in house.
architecture-beta
group boundary(cloud)["Switzerland or on premise"]
group outside(cloud)["External cloud"]
service ui(cloud)["Chat interface"] in boundary
service inference(server)["Inference engine"] in boundary
service model(server)["Open weights model"] in boundary
service rag(database)["RAG knowledge base"] in boundary
service docs(database)["Documents and data"] in boundary
service uscloud(cloud)["US cloud API"] in outside
ui:R -- L:inference
inference:R -- L:model
ui:B -- T:rag
rag:B -- T:docs
ui:R -- L:uscloud
The box is the sovereignty boundary: request, model, knowledge base and data all sit inside the organisation's own control. The dashed line to the outside is the path that sovereign AI deliberately does not take. The ongoing scan of which open models are mature enough for which purpose is the work of Tech Radar and AI Governance.
What it costs, honestly reckoned
Sovereignty is not free. Anyone seriously weighing the architecture reckons with three items:
- Hardware. Usable inference needs GPUs. A single mid-sized open-weights model runs on a single capable GPU; larger models or high concurrency need more. That is a purchase or a rental from a Swiss provider, not a zero.
- Operations. A self-run model wants updating, monitoring and securing. That is an operating discipline of its own, described by LLMOps and MLOps. Underestimating operations merely shifts the risk from data protection to availability.
- Quality gap. This is the most honest item. The largest closed frontier models still lead on the hardest tasks, such as long reasoning or multilingual code. Open models have closed the gap noticeably and are good enough for many structured, tool-driven tasks; at the very top, however, the openly available ones are not on par with the strongest closed model. The question is not whether the open model is the best among the Language Models, but whether it is good enough for the specific use case.
These three items belong together in an honest calculation. Against them stand the eliminated data outflow, the absent vendor lock-in and predictable costs instead of a price per request.
When it pays off
Sovereign AI is not an end in itself and not the right choice for every use case. It pays off when at least one of these points applies:
- The data being processed is particularly sensitive, such as personal data, professional secrecy or trade secrets, so that an outflow into a foreign jurisdiction is ruled out.
- The volume is high enough that a cloud API's price per request exceeds the fixed cost of in-house hardware.
- Vendor lock-in is a real risk because the system is meant to run long term and a retirement would be expensive.
- A concrete use case, such as an internal knowledge base, can be solved with an open model at sufficient quality.
Where only harmless text is processed occasionally and the highest model quality counts, a cloud API may remain the more pragmatic choice. The clean decision between the two paths, including the question of who may run which model on which data, is settled by AI governance; as soon as models act autonomously in a loop, the blast radius shifts further, which the page on AI agents sets out. The verifiable build of a sovereign knowledge base is bundled by the Sovereign RAG Architectures service; a scoped first step is examined by the Enterprise RAG Proof of Concept.
References
- Towards Data Science The Infrastructure Behind Making Local LLM Agents Actually Useful. A field report on running open models on in-house hardware with the vLLM inference engine. (28.05.2026). towardsdatascience.com/the-infrastructure-behind-making-local-llm-agents-actually-useful/
- vLLM vLLM Documentation. Open-source inference and serving engine for running large language models efficiently in house. (2026). docs.vllm.ai/en/latest/
- Federal Chancellery Digital Switzerland Strategy. The guidelines binding for the Federal Administration on digital transformation and sovereignty. (2026). www.bk.admin.ch/en/digital-switzerland
- Mistral AI Mistral 7B. Announcement of a European open-weights model under the Apache 2.0 licence, without usage restrictions. (27.09.2023). mistral.ai/news/announcing-mistral-7b
- Cornell Law School 18 U.S. Code ยง 2713. The disclosure obligation of US providers regardless of storage location, the core of the US Cloud Act. (23.03.2018). www.law.cornell.edu/uscode/text/18/2713
Related topics
- GenAI and RAG, the most common application of sovereign AI on in-house knowledge.
- Digital Sovereignty, the strategic frame sovereign AI sits in.
- US Cloud Act, the jurisdiction question behind the data outflow.
- AI Governance, the control over who may run which models on which data.
- AI Agents and Agentic Systems, where the blast radius of autonomous models grows.
- LibreChat, the sovereign interface in front of the in-house model.
- Sovereign RAG Architectures, the commercial service counterpart.
Ask AI
These links open external AI services, the conversation and its content are sent to their providers.