Published: Last updated:

Prompt and Context Engineering

Prompt and context engineering: the biggest lever at the lowest cost

Shaping the model input, meaning instruction, retrieved context and tool schemas, is often the first and cheapest lever. It is the stage before fine-tuning, and context engineering generalises classic prompting, because RAG is only one of several context sources within it.

A language model sees nothing but what sits in its context window at the moment of the answer. The quality, reliability and cost of an answer are therefore decided not inside the model but beforehand, when that input is assembled. This page describes the input layer: what separates prompt from context engineering, what demonstrably works, how a single prompt becomes a maintainable system, and where the line runs to GenAI and RAG and to the fine-tuning decision.

Prompt engineering and context engineering

Prompt engineering is the writing and structuring of the instruction: what the model should do, in which role, with which examples and in which output format. It is the oldest and most immediate lever and remains the foundation.

Context engineering is the broader term. It curates the entire set of tokens present at answer time: system instruction, conversation history, retrieved documents, tool schemas and intermediate results. As soon as a language model works across several steps or calls tools, the task is no longer writing one prompt but managing a scarce attention budget. Prompt engineering is thus a special case of context engineering, namely tending the instructed part of the input. This shift is the same one that carries the move from a single call to AI agents and agentic systems, where history and tool outputs fill the budget fast.

What demonstrably works

The input layer has few but robust levers. They are useful across models but must be evaluated per model:

  • Structure and role. A clear task, a defined role and a binding output format beat any vague phrasing. Where the format is processed downstream, it belongs in the instruction as a schema, not as a request.
  • Examples. A few good examples show the model the desired shape more reliably than a description can. They are often the biggest jump in quality per token invested.
  • Retrieved context. The right evidence from internal sources grounds the answer in facts rather than model memory. That is exactly what RAG provides, and here it is one context source among others, not the topic itself.
  • Tool schemas. In agentic systems the description of the available tools is part of the input. Precise schemas decide whether the model calls a tool correctly or fails at it.
  • Context budget. The context window is finite, and more context is not automatically better. Too much or poorly ordered context dilutes attention and drives up cost, because the whole input is reprocessed at every step. Filling the budget deliberately, only the relevant in the right order, is the actual core of context engineering.

The input as layers

The model input is not free text but a deliberately assembled budget made of several layers. Context engineering decides what reaches the finite context window from each layer and in what order:

flowchart TD
    A["System instruction<br/>role, task, output format"] --> E["Context window<br/>finite token budget"]
    B["Examples<br/>desired shape"] --> E
    C["Retrieved context<br/>RAG, one source among several"] --> E
    D["Tool schemas<br/>available tools"] --> E
    H["History<br/>earlier steps"] --> E
    E --> M["Language model"]
    M --> O["Answer"]

The point of the diagram is the scarcity at the context-window node. Every layer competes for the same budget, and the answer is only as good as the selection that arrives there. Prompt engineering mostly tends the upper layers; context engineering manages the budget as a whole.

From prompt to system

As long as a prompt lives in a chat window, it is not reproducible. In production the input becomes code: versioned, tested and observed. Three practices make the difference:

  • Versioning. Prompts and context templates belong in version control, with a traceable change history, like any other part of the application.
  • Evaluation. A change to the input needs a test against a fixed set of cases, otherwise every improvement is a guess. Because the output is non-deterministic, systematic evaluation replaces gut feeling.
  • Observation. In production, cost, latency and hit quality per input version show whether a change works. This telemetry can stay in house, because the tooling is self-hostable.

This makes the input layer part of operations rather than a one-off tinkering. What this lifecycle of versioning, evaluating and observing looks like as a discipline is described by LLMOps and MLOps; open tools such as Agenta cover prompt management and evaluation in a self-hostable frame. This maturity is also what separates AI development from a polished demo.

Drawing the line to RAG and fine-tuning

Prompt and context engineering is the input layer and therefore the cheapest of the three levers. It pays to keep the three apart cleanly:

  • Prompt and context engineering shapes what the model sees at runtime. It does not change the model and costs the least.
  • RAG is a context source within this layer. It pulls current, verifiable facts from internal data into the input without altering the model, and is described in full in GenAI and RAG.
  • Fine-tuning changes the model weights themselves. It is the most expensive and slowest lever and only right where form or specialist knowledge should belong permanently to the model.

In practice the cheapest stage solves most problems. Only once prompt, context and retrieval are exhausted does the question of fine-tuning arise. That decision, when to prompt, when to use RAG and when to fine-tune, is its own trade-off and belongs on a page of its own, not in the input layer.

Inputs under in-house control

Because the input layer does not change the model, it can run entirely on in-house infrastructure. Prompts, retrieved context and the telemetry of evaluation stay in house when the tooling is self-hostable and the model runs locally. Data ownership is thus less a question of the model than a question of input architecture. This is exactly where the Sovereign RAG Switzerland service starts, bringing the organisation's own knowledge into the answer verifiably and without data leaving the country; the concrete entry through a measurable trial setup is the Enterprise RAG Proof-of-Concept. Which models and data flows are approved for this is settled by AI governance at the control level. For Swiss organisations this means that, with the model running under Swiss jurisdiction, the entire input stream stays bound to the revised FADP.

References


Related topics

Ask AI

These links open external AI services, the conversation and its content are sent to their providers.