gRPC Across Languages: Production Lessons

gRPC across languages promises neutral RPC. In production the gaps are real: load balancing, deadlines, status codes, and schema evolution. The fixes.

Part of Polyglot Microservices: Choosing the Right Language

By Colson · Distinguished Software Engineer, Founder

June 25, 2026 7 min read

gRPC across languages, shown as multiple cyan signal lines converging into one shared amber contract channel

Running gRPC across languages delivers on its core promise: define a service once in Protobuf, generate clients and servers in every language, and they talk. Where it surprises teams is everything around the call, the load balancing, deadlines, status codes, and schema evolution, because those behaviors are set by each language’s implementation, not by the proto.

The proto gives you a shared vocabulary. It does not give you shared behavior. This post is the set of things you have to configure deliberately to make gRPC dependable across a polyglot fleet, building on the failure modes in Why Language Boundaries Break Polyglot Microservices. It is part of the Language choices in polyglot microservices series.

Why gRPC behaves differently across languages

gRPC’s generated stubs are so good that they hide how much behavior is left to defaults. A Go client, a Java client, and a Python client generated from the same proto will call the service identically and behave differently under load, failure, and idle.

Teams discover this the hard way: a service works perfectly in integration tests, then under real traffic it load-balances to one pod, holds dead connections after a deploy, or burns work on requests that already timed out. None of that is in the proto, so none of it shows up when you only review the schema.

Why does gRPC load-balance poorly on Kubernetes?

Because gRPC multiplexes many requests over one long-lived HTTP/2 connection, and a standard Kubernetes Service balances at the connection level, it pins each client to a single backend pod. You scale to ten replicas and one pod takes nearly all the traffic. Fix it with request-level balancing.

This is the single most common production surprise. Connection-level balancing cannot help, because there is only one connection carrying everything.

How do gRPC deadlines work across services?

You set a deadline on a call, and the remaining budget should propagate to each downstream hop so every service knows how much time is left. A call that has spent 150 ms of a 200 ms budget should pass 50 ms downstream, not a fresh 200. Not every language propagates this identically by default, so verify it under test.

A gRPC call without a deadline can wait forever. In a chain of services, one missing deadline means a slow dependency can pin resources all the way up the call stack. Propagating the remaining budget stops the system from doing work for requests the original caller has already abandoned.

A request that times out at the edge while three downstream services keep working is the canonical wasted-work failure, and it gets worse exactly when you can least afford it, under load.

How should I handle errors in gRPC across languages?

Map your native errors to gRPC status codes with structured error details, and branch on the status code on the client. The status code is the only error signal that means the same thing in every language. Never parse the human-readable status message for control flow.

gRPC defines a fixed set of status codes (OK, INVALID_ARGUMENT, DEADLINE_EXCEEDED, UNAVAILABLE, and so on). Those codes are the one part of your error handling that survives the wire intact. Everything else, your Go error, your Java exception, your Python exception, your Rust Result, is local.

Get the codes right and retries become safe. UNAVAILABLE is generally retryable; INVALID_ARGUMENT never is. If a server returns the wrong code, clients across every language will retry things they shouldn’t or give up on things they should retry. The code is the contract; honor it precisely.

gRPC status code	Retryable?	Typical meaning
`UNAVAILABLE`	Yes (with backoff)	Transient: server down, connection dropped
`DEADLINE_EXCEEDED`	Sometimes	Only if the operation is idempotent
`RESOURCE_EXHAUSTED`	Yes (with backoff)	Rate-limited or quota hit; back off
`INVALID_ARGUMENT`	No	Bad request; retrying repeats the error
`NOT_FOUND`	No	The thing isn’t there; a retry won’t change that
`ALREADY_EXISTS`	No	Duplicate; treat as success or a real conflict
`PERMISSION_DENIED`	No	Auth problem; fix the caller, don’t retry
`INTERNAL`	No (usually)	Server bug; retrying rarely helps

How do I evolve a Protobuf schema without breaking clients?

Never reuse or change a field number, never change a field’s type, and reserve the numbers and names of fields you remove. Add new fields with new numbers, and reserve the zero value as an UNKNOWN enum case handled on every client. Follow those rules and old and new clients interoperate safely.

Protobuf is designed for backward and forward compatibility, but only if you follow its rules. Adding new fields with new numbers is safe: old clients ignore what they don’t know, and new clients see defaults for what old servers don’t send.

The cross-language wrinkle is enums and unknown values. When a new server sends an enum value an old client has never seen, different languages handle that “unknown” case differently. Reserve the zero value as an explicit UNKNOWN, handle it on every client, and you avoid the silent misroute where one language defaults an unknown enum to the wrong branch.

Lesson: keepalive and connection age need one shared policy

Because gRPC connections are long-lived, how you keep them healthy matters, and the defaults differ by implementation. Keepalive ping intervals, max connection age, and idle timeouts are all configurable, and if each language inherits its own defaults you get intermittent, timing-dependent failures after idle periods or deploys.

Set keepalive and connection-age parameters explicitly, derived from one shared config, identical across every service. A bounded max connection age is also how you get clients to periodically re-resolve and rebalance, which works hand in hand with the load-balancing fix above: connections that live forever never rebalance to pods added after they were established.

Should you retry failed gRPC calls?

Retry only the status codes that are safe to retry, and only with backoff and a budget. UNAVAILABLE and DEADLINE_EXCEEDED on an idempotent operation are reasonable to retry; INVALID_ARGUMENT, NOT_FOUND, and ALREADY_EXISTS never are. Retrying the wrong code turns a small blip into a self-inflicted traffic storm.

The cross-language catch is that retry behavior is configurable and the defaults differ. Some stacks ship a service-config-driven retry policy; others leave it entirely to you. If each language picks its own policy, one client hammers a struggling backend while another gives up immediately on the same failure. Define one retry policy (which codes, how many attempts, what backoff, what overall budget) and apply it identically everywhere.

Two rules keep retries from amplifying an incident. First, use exponential backoff with jitter, so a thousand clients do not retry in lockstep and synchronize into a thundering herd. Second, cap retries with a budget (a ceiling on the fraction of requests that may be retries), so a widespread failure cannot multiply your real traffic by your retry count at the worst possible moment. Retries are a safety mechanism only when they are bounded; unbounded, they are an outage accelerator.

A cross-language gRPC checklist

Before a gRPC service goes to production in a polyglot fleet:

Load balancing is request-level (client-side LB or an HTTP/2-aware proxy or mesh), not a plain connection-level Service.
Every call has a deadline, and remaining-budget propagation is verified per language.
Errors map to gRPC status codes with structured details; no client parses status messages for logic.
Status codes are correct for retry semantics (UNAVAILABLE retryable, INVALID_ARGUMENT not).
Schema changes follow Protobuf rules: stable field numbers, reserved on removals, UNKNOWN zero-value enums handled everywhere.
Keepalive, idle, and max-connection-age come from one shared config, identical across services.
A cross-language integration test exercises a real client/server pair, not just one language against itself.

What I’d do differently

The lesson that took longest to internalize is that the proto file feels like the whole contract and is only half of it. The schema specifies what you send. Production behavior is decided by how each language’s runtime handles deadlines, balancing, errors, and connections, and those live in config, not in the .proto.

If I were setting up gRPC across languages again, I would establish the behavioral defaults (load balancing, deadline propagation, status-code mapping, keepalive) as a shared, reviewed config before the second language joined the mesh, and I would write a conformance test that every new service must pass. The generated stubs are excellent. They just don’t make these decisions for you, and the defaults are not the decisions you want.

Sources

gRPC documentation, Load balancing: grpc.io/docs/guides/custom-load-balancing
gRPC documentation, Deadlines: grpc.io/docs/guides/deadlines
gRPC documentation, Status codes: grpc.io/docs/guides/status-codes
Protocol Buffers, Updating a message type: protobuf.dev/programming-guides/proto3/#updating

#grpc #protobuf #microservices #polyglot #distributed-systems

Frequently asked questions

Why does gRPC load-balance poorly on Kubernetes?

Because gRPC multiplexes requests over one long-lived HTTP/2 connection, and a standard Kubernetes Service balances at the connection level, pinning a client to one pod. Use client-side load balancing or an HTTP/2-aware proxy or mesh to balance at the request level.

How do gRPC deadlines work across services?

You set a deadline on a call, and the remaining budget should propagate to downstream calls so each hop knows how much time is left. Not all language implementations propagate it identically by default, so verify it under test. A missing deadline lets a slow dependency pin resources up the chain.

How should I handle errors in gRPC across languages?

Map native errors to gRPC status codes with structured error details, and interpret the status code on the client. The status code is the only error signal that means the same thing in every language. Never branch on the status message text.

How do I evolve a Protobuf schema without breaking clients?

Never reuse or change field numbers, never change a field's type, and reserve removed field numbers and names. Add new fields with new numbers. Reserve the zero value as UNKNOWN for enums and handle that case on every client.

Why gRPC behaves differently across languages

Why does gRPC load-balance poorly on Kubernetes?

How do gRPC deadlines work across services?

How should I handle errors in gRPC across languages?

How do I evolve a Protobuf schema without breaking clients?

Lesson: keepalive and connection age need one shared policy

Should you retry failed gRPC calls?

A cross-language gRPC checklist

What I’d do differently

Sources

Frequently asked questions

Liked this breakdown?

Keep reading

Why Language Boundaries Break Polyglot Microservices

Building Rust Hot Path Services in Production

Go vs Rust for Microservices: When to Choose Which