Published: Last updated:

API Gateway and Service Mesh

Traffic from outside and traffic between services belong apart

An API gateway governs north-south traffic, the requests that enter the system from outside. A service mesh governs east-west traffic, the communication between services. Both solve different problems in different places, and neither replaces the other.

As soon as an application consists of several services, two distinct kinds of traffic appear that are easily lumped together. One leads in from outside: a client calls the system. The other stays inside: one service calls another service. Gateway and mesh each address one of these directions. Equate them, and the result is either a mesh that is not needed or a gateway expected to do a job it does not do. This page separates the two patterns cleanly, sets out their benefit, and names the price a mesh exacts.

Two traffic directions

The terms north-south and east-west come from the network diagram, where external traffic is drawn top and bottom and internal traffic left and right.

  • North-south. Traffic that crosses the system boundary: a browser, mobile app or foreign system calls an API. This is about entry, authentication, throttling and bundling many internal interfaces into one outward face.
  • East-west. Traffic between internal services that never leaves the system boundary. This is about retries, timeouts, encryption between services and the visibility of which service talks to which.

A gateway sits at the north-south point, a mesh spreads across the east-west plane. They overlap at exactly one spot, the entry into the mesh, but they are not interchangeable.

The API gateway

An API gateway is the single entry point for external requests. It receives the call, settles identity and routes it to the right internal service. Instead of every service bringing its own authentication, throttling and logging, these cross-cutting tasks move to one place. Typical jobs are:

  • Routing. An external request is dispatched to one or more internal services, often fanned out to several and the result merged.
  • Entry control. Authentication and authorisation before a request reaches any internal service at all.
  • Throttling and quotas. Protection against overload and a fair share of capacity across callers.
  • Translation. A uniform external protocol facing outward, while internally different protocols stay allowed.

The gateway is thus the natural home for everything that happens at the edge of an API-First architecture. One variant, the Backends for Frontends pattern, provides a separate gateway per client type, so a mobile app does not get the same response shape as a web shop.

The service mesh

A service mesh moves the logic of service-to-service communication out of application code into a dedicated infrastructure layer. In the classic build, a small proxy runs alongside each service, the sidecar proxy, through which all of that service's traffic flows. These proxies form the data plane, a central control plane configures them uniformly. This makes it possible to add, across all services and without code changes:

  • Encryption between services. Mutual TLS authentication (mTLS), so that internal traffic too is encrypted and identified on both sides. This fits the logic of Zero Trust, which does not trust the internal network either.
  • Resilience. Retries, timeouts and the isolation of failing services, so a single fault does not drag the whole call chain down.
  • Traffic steering. Fine-grained control, for example gradual rollouts or mirroring traffic onto a new version.
  • Visibility. Uniform metrics and tracing for every service-to-service call that feed Observability without each service having to produce them itself.

These capabilities are not an end in themselves. They pay off mainly when many services exist and the same cross-cutting concerns would otherwise have to be solved in each service separately.

Where they meet

architecture-beta
    group clients(cloud)["Clients"]
    group edge(cloud)["North-south"]
    group mesh(server)["East-west service mesh"]
    service browser(cloud)["Browser and mobile"] in clients
    service gateway(server)["API gateway"] in edge
    service svcA(server)["Service A"] in mesh
    service svcB(server)["Service B"] in mesh
    service svcC(server)["Service C"] in mesh
    browser:R -- L:gateway
    gateway:R -- L:svcA
    svcA:R -- L:svcB
    svcA:B -- T:svcC
    svcB:R -- L:svcC

The diagram shows the division of labour. The external call first hits the gateway, which hands it to a first service. From there the mesh takes over: every further call between services runs through the proxies that contribute encryption, retry and telemetry. Some mesh implementations can cover the entry themselves with their own ingress gateway; that does not replace the management functions of a full API gateway such as quotas or a developer portal.

The price of a mesh

A mesh is not free, and the effort is often underestimated. The honest balance:

  • Runtime overhead. In the sidecar model every call traverses two extra proxies, which adds a small latency per hop and a resource demand per service. With many services this adds up.
  • Operational complexity. Control plane, proxy configuration and certificate management are an extra layer that wants to be operated, observed and updated. It demands a certain Cloud Native maturity in operations.
  • Troubleshooting. An extra proxy in the path means an extra place where a call can fail.

To the overhead point newer architectures respond with a sidecarless approach: instead of one proxy per service, the data plane is bundled at node level, and a Layer 7 proxy is added only where its functions are needed. This noticeably lowers the resource demand, but it shifts the operational complexity rather than dissolving it.

Gateway First, Mesh Only at Scale

Separating the two patterns leads to a clear order:

  • Almost every system with external interfaces needs a gateway. As soon as more than one service is reachable from outside, the single entry point pays off for authentication, throttling and bundling. This already holds for a modulith with a few Microservices.
  • A mesh is worth it only from a certain number of services. With a handful of services, a few libraries solve the same problems with less effort. The point where a mesh is worth the effort is where mTLS, uniform resilience and end-to-end visibility across many services would otherwise become repeated manual work in each service.
  • Operational maturity first, then the mesh. A mesh assumes working automation, observability and delivery. A team still building these foundations should defer a mesh rather than double the complexity.

The rule of thumb: the gateway settles the entry, the mesh settles the scaling of internal communication. Confuse the two questions, and the wrong layer gets built first.

Related services

Which layer a team needs first, and from when a mesh pays off, is a question of platform maturity. Providing that internal platform with self-service and golden paths is covered by Platform Engineering (IDP); the measurement and telemetry layer a mesh feeds into is covered by Observability and Telemetry.

References


Related topics

Ask AI

These links open external AI services, the conversation and its content are sent to their providers.