Published: Last updated:

Caching and CDN

Speed depends on mastering cache eviction

Caching keeps a finished answer closer to the requester instead of recomputing or re-transmitting it every time. That cuts latency and load at once. The hard part is never the storing, it is the discarding: cache invalidation decides whether a fast cache is also a correct one.

Every additional cache layer makes an application faster and at the same time harder to reason about. A stale answer from the cache is a defect from the user's point of view, even when it was delivered flawlessly. This page describes the staged cache layers from client to database, the role of content delivery networks at the network edge, and why invalidation is the real problem.

The cache layers

Caching is not a single place but a chain of layers between user and data source. Each layer intercepts a share of the requests before they reach the next:

  • Client cache. The browser holds static files and HTTP responses locally, controlled via Cache-Control, ETag and Last-Modified. A request the browser answers from its own cache never reaches the network at all.
  • Reverse-proxy and edge cache. An upstream proxy such as Varnish or nginx, or a CDN node, answers recurring requests from many users without burdening the application. This is a shared cache and follows different rules than the private browser cache.
  • Application cache. Inside the application, in-memory stores such as Redis or Memcached hold computed results, session data or composed objects. This layer relieves the database by not repeating expensive work.
  • Database cache. The database itself keeps data and index pages, and execution plans depending on the database, in memory (the buffer pool). This layer is mostly transparent, but it becomes noticeable as soon as memory runs short.

The order is deliberate: a cache hit at an early layer is cheaper than one at a late layer. The browser cache saves a whole network round trip, the database cache only a disk read. Where these layers live and how they are operated is described by the cloud-native architecture, in which stateless services are what make the shared cache practical in the first place.

CDN: the cache at the network edge

A content delivery network distributes copies of content across many geographically spread nodes (points of presence). A request is routed to the nearest node, which either serves the answer from its cache or fetches it once from the origin server and then holds it itself. Two effects overlap here:

  • Proximity cuts latency. The physical distance between user and node sets the unavoidable minimum round-trip time. A node in the same region saves the packets a trip across continents.
  • Distribution cuts load. The origin server now sees only the requests no node could answer (a cache miss). For content that caches well, the CDN intercepts the bulk of the traffic.

Classically, CDNs were limited to static files. Modern CDNs can be configured to cache dynamic responses too, or to run logic at the edge (edge computing). These capabilities depend on the provider and the plan, they are not an automatic feature of every CDN. For read-heavy content, for example in headless e-commerce, the CDN layer is often the single largest lever on perceived speed.

The hard problem: invalidation

Writing a value into a cache is trivial. The difficulty lies in removing it at the right moment, before it goes wrong, but not so early that the cache becomes useless. Three basic strategies are available, and they are not mutually exclusive:

  • Time-based expiry (TTL). An entry counts as fresh for a fixed duration and is then discarded or revalidated. Simple but imprecise: the cache can be stale right up to expiry, and just before expiry it is discarded needlessly.
  • Validation. Instead of expiring blindly, the cache asks the origin via If-None-Match (against the ETag) or If-Modified-Since whether anything has changed. If it has not, the origin answers 304 Not Modified and the content transfer is avoided.
  • Active eviction (purge). When the data is written, the affected cache entry is removed on purpose. CDNs support this via surrogate keys or cache tags, which allow invalidating related content in one step without flushing the entire cache.

It gets hard as soon as a cached value is composed from several sources. Then every change to one source has to know which derived cache entries it touches. A pattern like stale-while-revalidate softens the trade-off: the cache may still serve an expired value while fetching the fresh version in the background, so no user waits on the recomputation. Correctness still remains a matter of discipline, not of technology alone.

Latency versus consistency

Every cache decision is a deliberate trade. A long TTL buys speed and load relief but risks users seeing stale data. Caching briefly or purging on every change keeps the data fresh but gives back part of the gain. The right choice hinges on the content: a product image tolerates hours, an account balance not one second. The following path shows how a request traverses the layers and where it ideally ends:

flowchart TD
    A["User request"] --> B{"Client cache fresh?"}
    B -->|yes| Z["Answer with no network"]
    B -->|no| C{"Edge or proxy cache hits?"}
    C -->|yes| Y["Answer from the edge"]
    C -->|no| D{"Application cache hits?"}
    D -->|yes| X["Answer from in-memory store"]
    D -->|no| E["Database, then fill the cache"]
    E --> F["Invalidate on the next write"]

The diagram path makes visible why a hit at an early layer is worth more: it short-circuits all the steps that follow. The last node is the decisive one, because without clean invalidation on write, every upstream layer with a high hit rate reliably serves stale data.

Where caching tips over

  • Silent staleness. A TTL set too generously serves old content for weeks without anyone noticing a fault. The cache works technically, the result is wrong anyway.
  • Cache stampede. When a heavily requested entry expires, all requests punch through to the origin at once and overload it. The remedies are staggered expiry times or a lock that allows only one recomputation.
  • Wrong cache key. Cache user-specific content under a shared key and users see each other's data. Separating private from shared cache is not a detail here, it is a security question.
  • Caching as a band-aid. A cache that papers over a fundamentally too-slow query only defers the problem. When the cache fails, the system fails. Whether the cache holds is shown only by end-to-end observability over hit rate, latency and origin load.

Control over where content travels

Running a CDN means giving up part of the control over where copies of content sit: a CDN distributes them across nodes in many countries. As soon as that content contains personal data, for example in a cached API response, the data may leave Switzerland and the EU, and the revised Data Protection Act (revFADP) as well as, with US providers, the US Cloud Act then apply to cache copies too. Static, non-personal content is uncritical; personal or particularly sensitive content either does not belong in a globally distributed cache or belongs with a provider whose locations can be controlled. Measuring whether the cache layers actually take effect is part of the observability and telemetry service; the operational underpinning on which shared caches and CDN integration run cleanly belongs to platform engineering and IDP.

References


Related topics

Ask AI

These links open external AI services, the conversation and its content are sent to their providers.