Skip to content

Split View: gRPC 심화 완전 가이드 2025: Protocol Buffers, 4가지 통신 모델, 인터셉터, 로드 밸런싱

✨ Learn with Quiz
|

gRPC 심화 완전 가이드 2025: Protocol Buffers, 4가지 통신 모델, 인터셉터, 로드 밸런싱

TL;DR

  • gRPC = HTTP/2 + Protobuf: 5-10배 빠르고 작은 RPC
  • 4가지 모델: Unary, Server Streaming, Client Streaming, Bidirectional
  • Protobuf 최적화: 필드 번호, 변환된 byte, varint 인코딩
  • 인터셉터: 횡단 관심사 (인증, 로깅, 메트릭) 처리
  • Deadline 전파: 분산 환경에서 타임아웃 누적 방지
  • gRPC-Web: 브라우저에서 gRPC 사용

1. gRPC가 등장한 배경

1.1 RPC의 역사

RPC (Remote Procedure Call) = 다른 머신의 함수를 로컬 함수처럼 호출.

진화:

  • 1980s: Sun RPC (NFS의 기반)
  • 1990s: CORBA (복잡, 실패)
  • 2000s: SOAP/WSDL (XML, 느림)
  • 2010s: REST (JSON over HTTP)
  • 2015~: gRPC (Google이 내부에서 사용하던 Stubby 공개)

1.2 REST의 한계

GET /api/users/123 HTTP/1.1
Host: api.example.com

HTTP/1.1 200 OK
Content-Type: application/json
{"id": 123, "name": "Alice", "email": "alice@example.com"}

문제:

  1. JSON 비효율: 텍스트, 키 반복
  2. HTTP/1.1: head-of-line blocking
  3. 느슨한 계약: 스키마 강제 X
  4. 단방향: 양방향 스트리밍 어려움
  5. 클라이언트 코드 작성 부담

1.3 gRPC의 약속

service UserService {
  rpc GetUser(GetUserRequest) returns (User);
  rpc StreamUsers(Empty) returns (stream User);
}

message User {
  int64 id = 1;
  string name = 2;
  string email = 3;
}

자동 코드 생성 (모든 언어), 스키마 강제, 5-10배 작은 페이로드, HTTP/2 멀티플렉싱, 양방향 스트리밍.


2. Protocol Buffers 깊이

2.1 기본 구조

syntax = "proto3";

package myapp;

message User {
  int64 id = 1;
  string name = 2;
  string email = 3;
  repeated string tags = 4;
  Address address = 5;
}

message Address {
  string street = 1;
  string city = 2;
}

2.2 필드 번호의 의미

message User {
  int64 id = 1;      // ← 1은 필드 번호
  string name = 2;
}

왜 필드 번호?:

  • 이진 형식의 식별자
  • 이름 변경에도 호환 (번호만 같으면)
  • 삭제된 필드는 번호 재사용 X
// v1
message User {
  string user_name = 1;
}

// v2 (호환됨)
message User {
  string display_name = 1;  // 이름 바꿈, 번호 같음
}

2.3 Wire Format

Protobuf의 이진 형식:

┌─────────────┬──────────────┐
│ field_number│  field_value │
+ type    │              │
└─────────────┴──────────────┘

각 필드: (field_number << 3) | wire_type.

Wire Types:

  • 0: VARINT (int32, int64, bool)
  • 1: FIXED64 (double, fixed64)
  • 2: LENGTH_DELIMITED (string, bytes, message)
  • 5: FIXED32 (float, fixed32)

2.4 Varint 인코딩

작은 수는 작은 byte로 인코딩:

01 byte
1271 byte
1282 bytes
163832 bytes
163843 bytes

효과: 작은 ID, 카운터 등이 매우 효율적.

2.5 비교 예시

// JSON: 78 bytes
{"id":123,"name":"Alice","email":"alice@example.com"}
// Protobuf: 35 bytes (45% 작음)
0a 03 31 32 33  # field 1 (id), len 3, "123"
12 05 41 6c 69 63 65  # field 2 (name), len 5, "Alice"
1a 11 ...

큰 메시지일수록 차이가 큼 (key 반복이 없으므로).

2.6 Schema Evolution

규칙:

  • ✅ 새 필드 추가 (다른 번호로)
  • ✅ 필드 이름 변경
  • ✅ optional → required X (proto3는 모두 optional)
  • ❌ 필드 번호 변경
  • ❌ 필드 타입 변경 (호환 안 됨)
  • ❌ 필드 삭제 후 번호 재사용 (reserved 사용)
message User {
  reserved 4, 5;          // 4, 5는 사용 안 함
  reserved "old_field";    // 이름도 reserved
  
  int64 id = 1;
  string name = 2;
  string email = 3;
}

3. 4가지 통신 모델

3.1 Unary RPC (가장 일반적)

rpc GetUser(GetUserRequest) returns (User);

동작: 1 요청 → 1 응답. REST와 비슷.

# Client
response = stub.GetUser(GetUserRequest(user_id=123))
print(response.name)
# Server
def GetUser(self, request, context):
    user = db.get(request.user_id)
    return User(id=user.id, name=user.name, email=user.email)

사용: CRUD, 일반 API.

3.2 Server Streaming

rpc StreamUsers(Empty) returns (stream User);

동작: 1 요청 → N 응답.

# Client
for user in stub.StreamUsers(Empty()):
    print(user.name)
# Server
def StreamUsers(self, request, context):
    for user in db.iter_all():
        yield User(id=user.id, name=user.name)

사용: 큰 결과 셋 (수천 행), 실시간 업데이트, 로그 스트리밍.

3.3 Client Streaming

rpc UploadEvents(stream Event) returns (UploadResponse);

동작: N 요청 → 1 응답.

# Client
def event_generator():
    for event in events:
        yield event

response = stub.UploadEvents(event_generator())
print(response.processed)
# Server
def UploadEvents(self, request_iterator, context):
    count = 0
    for event in request_iterator:
        process(event)
        count += 1
    return UploadResponse(processed=count)

사용: 대용량 업로드, 배치 처리.

3.4 Bidirectional Streaming

rpc Chat(stream ChatMessage) returns (stream ChatMessage);

동작: N 요청 ↔ N 응답. 양쪽이 독립적으로 메시지 보냄.

# Client
def send_messages():
    yield ChatMessage(text="Hello")
    yield ChatMessage(text="How are you?")

for response in stub.Chat(send_messages()):
    print(f"Received: {response.text}")
# Server
def Chat(self, request_iterator, context):
    for msg in request_iterator:
        yield ChatMessage(text=f"Echo: {msg.text}")

사용: 채팅, 실시간 게임, IoT 양방향 통신.

3.5 비교

모델요청응답사용 사례
Unary11CRUD
Server Streaming1N큰 결과, 실시간 업데이트
Client StreamingN1업로드, 배치
BidirectionalNN채팅, 게임

4. HTTP/2의 역할

4.1 HTTP/1.1의 한계

[Client] ─────────→ [Server]
[Connection 1: Request 1]
[Connection 2: Request 2]
[Connection 3: Request 3]
   ...

문제:

  • 각 요청 = 새 연결 (또는 keep-alive but sequential)
  • Head-of-line blocking
  • 헤더 반복

4.2 HTTP/2 멀티플렉싱

[Client] ─────────→ [Server]
[Single Connection]
   ├─ Stream 1: Request 1Response 1
   ├─ Stream 2: Request 2Response 2
   └─ Stream 3: Request 3Response 3

장점:

  • 단일 TCP 연결로 여러 요청 동시
  • 헤더 압축 (HPACK)
  • 서버 푸시
  • 바이너리 프레이밍

4.3 gRPC의 HTTP/2 활용

  • 각 RPC = 1 stream
  • 같은 연결로 수천 RPC 동시
  • TCP 연결 오버헤드 최소화
  • 양방향 스트리밍 자연스럽게 가능

4.4 헤더 압축

HTTP/1.1:

GET /api/users/123 HTTP/1.1
Host: api.example.com
User-Agent: MyApp/1.0
Accept: application/json
Authorization: Bearer xyz...

매번 전송. HTTP/2 (HPACK)는:

  • 정적 테이블 (자주 쓰는 헤더)
  • 동적 테이블 (이전 요청의 헤더 캐시)
  • Huffman 인코딩

헤더 크기 80%+ 절감.


5. 인터셉터 (Interceptors)

5.1 무엇인가?

Interceptor = RPC 호출 전후에 실행되는 미들웨어. 횡단 관심사 처리.

용도:

  • 인증 (JWT 검증)
  • 로깅
  • 메트릭 (Prometheus)
  • 분산 트레이싱
  • 에러 처리
  • 재시도

5.2 Server Interceptor (Python)

import grpc

class AuthInterceptor(grpc.ServerInterceptor):
    def intercept_service(self, continuation, handler_call_details):
        metadata = dict(handler_call_details.invocation_metadata)
        token = metadata.get('authorization')
        
        if not verify_token(token):
            return grpc.unary_unary_rpc_method_handler(
                lambda req, ctx: ctx.abort(grpc.StatusCode.UNAUTHENTICATED, 'Invalid token')
            )
        
        return continuation(handler_call_details)

server = grpc.server(
    futures.ThreadPoolExecutor(max_workers=10),
    interceptors=[AuthInterceptor()]
)

5.3 Client Interceptor

class RetryInterceptor(grpc.UnaryUnaryClientInterceptor):
    def intercept_unary_unary(self, continuation, client_call_details, request):
        for attempt in range(3):
            try:
                return continuation(client_call_details, request)
            except grpc.RpcError as e:
                if e.code() != grpc.StatusCode.UNAVAILABLE:
                    raise
                time.sleep(2 ** attempt)
        raise

channel = grpc.intercept_channel(
    grpc.insecure_channel('localhost:50051'),
    RetryInterceptor()
)

5.4 Go Interceptor

func loggingInterceptor(
    ctx context.Context,
    req interface{},
    info *grpc.UnaryServerInfo,
    handler grpc.UnaryHandler,
) (interface{}, error) {
    start := time.Now()
    resp, err := handler(ctx, req)
    log.Printf("%s took %v", info.FullMethod, time.Since(start))
    return resp, err
}

server := grpc.NewServer(
    grpc.UnaryInterceptor(loggingInterceptor),
)

5.5 Chained Interceptors

server = grpc.server(
    futures.ThreadPoolExecutor(max_workers=10),
    interceptors=[
        TracingInterceptor(),
        AuthInterceptor(),
        LoggingInterceptor(),
        MetricsInterceptor(),
    ]
)

순서대로 실행. 인증 → 로깅 같은 chain.


6. Deadline과 Cancellation

6.1 Deadline의 중요성

# 잘못 - 무한 대기
response = stub.GetUser(GetUserRequest(user_id=123))

# 올바름 - 5초 deadline
response = stub.GetUser(GetUserRequest(user_id=123), timeout=5.0)

왜?:

  • 네트워크 문제 시 영원히 대기
  • 스레드 고갈
  • 사용자 경험

6.2 Deadline Propagation

[Client] (timeout=5s)
   ↓ deadline=now+5s
[Service A] (4s 소요)
   ↓ remaining=1s
[Service B] (이미 1s 남았으므로 짧게)

효과: 호출 체인 전체에 deadline 전파. 누적 timeout 방지.

def call_chain(context):
    # context의 deadline 자동 사용
    response = downstream_stub.SomeRPC(request, timeout=context.time_remaining())

6.3 Cancellation

import grpc

# 클라이언트가 취소
future = stub.GetUser.future(request)
time.sleep(2)
future.cancel()  # 서버에 취소 신호 전파

서버 측:

def GetUser(self, request, context):
    while not context.is_active():  # 클라이언트가 취소했나?
        return None
    
    # 또는
    if context.cancelled():
        return None

6.4 Context의 역할

Go에서:

ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
defer cancel()

response, err := client.GetUser(ctx, &GetUserRequest{UserId: 123})

Context는:

  • Deadline 전달
  • 취소 신호
  • 메타데이터 (인증 토큰 등)

7. 로드 밸런싱

7.1 클라이언트 사이드 LB

gRPC 클라이언트가 여러 서버 IP를 알고 직접 선택.

# DNS-based
channel = grpc.insecure_channel(
    'dns:///my-service:50051',
    options=[
        ('grpc.lb_policy_name', 'round_robin')
    ]
)

정책:

  • pick_first: 첫 번째 사용 (기본)
  • round_robin: 순환
  • 사용자 정의

7.2 Look-aside LB

[Client][LB Service]"사용 가능한 서버: A, B, C"
[Client][Server A]

: gRPC + xDS (Envoy)

7.3 Proxy LB

[Client][Envoy Proxy][Servers]

: Envoy, Linkerd, Istio (Service Mesh)

장점: 클라이언트는 단일 endpoint만, 모든 LB 로직은 proxy.

7.4 헬스 체크

syntax = "proto3";

package grpc.health.v1;

service Health {
  rpc Check(HealthCheckRequest) returns (HealthCheckResponse);
  rpc Watch(HealthCheckRequest) returns (stream HealthCheckResponse);
}

message HealthCheckResponse {
  enum ServingStatus {
    UNKNOWN = 0;
    SERVING = 1;
    NOT_SERVING = 2;
  }
  ServingStatus status = 1;
}

표준 gRPC Health Check Protocol. 모든 gRPC 서버가 구현 가능.


8. 성능 최적화

8.1 메시지 크기 제한

기본 4MB. 큰 메시지는 거부됨.

options = [
    ('grpc.max_send_message_length', 100 * 1024 * 1024),  # 100MB
    ('grpc.max_receive_message_length', 100 * 1024 * 1024),
]

channel = grpc.insecure_channel('localhost:50051', options=options)

큰 데이터는 스트리밍 권장, 단일 메시지 X.

8.2 Connection Pooling

# 잘못 - 매 요청마다 새 연결
def get_user(user_id):
    channel = grpc.insecure_channel('localhost:50051')
    stub = UserServiceStub(channel)
    return stub.GetUser(GetUserRequest(user_id=user_id))

# 올바름 - 채널 재사용
channel = grpc.insecure_channel('localhost:50051')
stub = UserServiceStub(channel)

def get_user(user_id):
    return stub.GetUser(GetUserRequest(user_id=user_id))

왜?: TCP 연결 오버헤드. HTTP/2의 멀티플렉싱 활용.

8.3 KeepAlive

options = [
    ('grpc.keepalive_time_ms', 10000),       # 10초마다 ping
    ('grpc.keepalive_timeout_ms', 5000),     # ping 응답 5초 대기
    ('grpc.keepalive_permit_without_calls', True),
    ('grpc.http2.max_pings_without_data', 0),
]

효과: 죽은 연결 감지, NAT timeout 방지.

8.4 압축

# Server
server = grpc.server(..., compression=grpc.Compression.Gzip)

# Client
channel = grpc.insecure_channel(
    'localhost:50051',
    compression=grpc.Compression.Gzip
)

압축 알고리즘: gzip, deflate, snappy.

큰 메시지에 효과적, 작은 메시지는 오버헤드.


9. gRPC-Web — 브라우저 지원

9.1 문제

브라우저는 HTTP/2 trailers, raw HTTP/2 등을 지원 안 함. gRPC를 직접 사용 X.

9.2 gRPC-Web

브라우저 친화적 변종:

  • HTTP/1.1 또는 HTTP/2
  • Trailers를 body에 인코딩
  • CORS 지원
[Browser] ─ gRPC-Web ─→ [Envoy Proxy] ─ gRPC ─→ [Server]

9.3 사용

import { UserServiceClient } from './generated/user_pb_service'

const client = new UserServiceClient('https://api.example.com')

client.getUser(new GetUserRequest().setUserId(123), (err, response) => {
    if (err) console.error(err)
    else console.log(response.getName())
})

9.4 한계

  • 클라이언트 스트리밍 지원 X (대부분 구현)
  • 양방향 스트리밍 지원 X
  • 페이로드 약간 큼 (base64 인코딩)

9.5 Connect

Connect (Buf 회사): gRPC-Web의 후속작.

  • HTTP/1.1, HTTP/2, gRPC, gRPC-Web 모두 지원
  • 더 단순한 API
  • TypeScript first
import { createPromiseClient } from "@bufbuild/connect"
import { UserService } from "./gen/user_connect"

const client = createPromiseClient(UserService, transport)
const response = await client.getUser({ userId: 123 })

10. 디버깅과 도구

10.1 grpcurl

REST의 curl처럼 gRPC 호출:

# 서비스 목록
grpcurl -plaintext localhost:50051 list

# 메서드 호출
grpcurl -plaintext -d '{"user_id": 123}' \
  localhost:50051 \
  UserService/GetUser

Reflection 필요:

from grpc_reflection.v1alpha import reflection
SERVICE_NAMES = (UserService_pb2.DESCRIPTOR.services_by_name['UserService'].full_name, reflection.SERVICE_NAME)
reflection.enable_server_reflection(SERVICE_NAMES, server)

10.2 BloomRPC / Postman

GUI 클라이언트.

10.3 로깅

import grpc
import logging

logging.basicConfig(level=logging.DEBUG)
os.environ['GRPC_VERBOSITY'] = 'DEBUG'
os.environ['GRPC_TRACE'] = 'all'

10.4 분산 트레이싱

OpenTelemetry 통합:

from opentelemetry.instrumentation.grpc import GrpcInstrumentorServer

GrpcInstrumentorServer().instrument()

자동으로 모든 gRPC 호출이 trace에 포함.


퀴즈

1. Protobuf의 필드 번호가 왜 중요한가요?

: 필드 번호는 이진 형식의 식별자입니다. JSON처럼 키 이름을 매번 전송하지 않고, 작은 정수만 사용 → 페이로드 크기 감소. 또한 schema evolution의 핵심: 필드 이름은 변경 가능하지만 번호는 변경 불가. 같은 번호면 이전 버전과 호환. 삭제된 필드의 번호는 절대 재사용하면 안 됨 (reserved로 표시). 1-15는 1 byte, 16-2047은 2 byte로 인코딩 → 자주 쓰는 필드는 1-15에 할당.

2. gRPC가 REST보다 빠른 이유는?

: 4가지 요소: (1) Protobuf — JSON 대비 5-10배 작음, 파싱 빠름, (2) HTTP/2 — 멀티플렉싱, 단일 연결로 많은 요청, (3) HPACK 헤더 압축 — 반복 헤더 80%+ 절감, (4) 이진 프로토콜 — 텍스트 파싱 오버헤드 X. 결과: 처리량 5-10배, latency 절반. 단점: 디버깅 어려움 (이진), 브라우저 직접 지원 X (gRPC-Web 필요).

3. Bidirectional streaming은 언제 사용하나요?

: 양쪽이 독립적으로 메시지를 보내야 할 때. 사용 사례: (1) 채팅 — 사용자가 메시지 보내고, 서버가 다른 사용자 메시지 푸시, (2) 실시간 게임 — 클라이언트의 입력과 서버의 게임 상태 양방향, (3) IoT — 디바이스의 센서 데이터와 서버의 명령, (4) 음성 인식 — 오디오 스트림과 transcript 양방향. WebSocket과 비슷하지만 gRPC의 강한 타입 + Protobuf 효율.

4. Deadline propagation의 중요성은?

: 분산 시스템에서 호출 체인의 누적 timeout을 방지합니다. 예: Client (5s) → A (4s 소요) → B를 5s timeout으로 호출하면 → 총 9s 가능. 올바른 동작: A가 B를 호출할 때 남은 시간(1s)으로 호출. gRPC는 Context를 통해 deadline을 자동 전파. 모든 다운스트림 호출이 부모의 deadline을 상속. Go의 context.Context, Python의 context.time_remaining(). 분산 시스템 안정성의 핵심.

5. gRPC-Web과 일반 gRPC의 차이는?

: 브라우저는 HTTP/2 trailers, raw streams 등 gRPC 기능을 직접 지원 못 합니다. gRPC-Web은 브라우저 친화적 변종으로 (1) HTTP/1.1 또는 HTTP/2 사용, (2) trailers를 body에 인코딩, (3) CORS 지원. 한계: 클라이언트 스트리밍과 양방향 스트리밍 미지원 (서버 스트리밍은 OK). 서버는 보통 Envoy proxy를 통해 gRPC-Web ↔ gRPC 변환. Connect (Buf)가 더 단순한 후속 표준.


참고 자료

gRPC Deep Dive 2025: Protocol Buffers, Four Communication Models, Interceptors, Load Balancing

TL;DR

  • gRPC = HTTP/2 + Protobuf: 5-10x faster and smaller RPC
  • Four models: Unary, Server Streaming, Client Streaming, Bidirectional
  • Protobuf optimization: field numbers, wire types, varint encoding
  • Interceptors: handle cross-cutting concerns (auth, logging, metrics)
  • Deadline propagation: prevent timeout accumulation in distributed systems
  • gRPC-Web: use gRPC from browsers

1. Why gRPC Emerged

1.1 History of RPC

RPC (Remote Procedure Call) = calling a function on another machine as if it were local.

Evolution:

  • 1980s: Sun RPC (foundation of NFS)
  • 1990s: CORBA (complex, failed)
  • 2000s: SOAP/WSDL (XML, slow)
  • 2010s: REST (JSON over HTTP)
  • 2015+: gRPC (Google open-sourced its internal Stubby)

1.2 REST Limitations

GET /api/users/123 HTTP/1.1
Host: api.example.com

HTTP/1.1 200 OK
Content-Type: application/json
{"id": 123, "name": "Alice", "email": "alice@example.com"}

Problems:

  1. JSON inefficiency: textual, keys repeated
  2. HTTP/1.1: head-of-line blocking
  3. Loose contracts: no schema enforcement
  4. Unidirectional: bidirectional streaming is hard
  5. Client code burden

1.3 gRPC's Promise

service UserService {
  rpc GetUser(GetUserRequest) returns (User);
  rpc StreamUsers(Empty) returns (stream User);
}

message User {
  int64 id = 1;
  string name = 2;
  string email = 3;
}

Auto code generation (all languages), schema enforcement, 5-10x smaller payloads, HTTP/2 multiplexing, bidirectional streaming.


2. Protocol Buffers Deep Dive

2.1 Basic Structure

syntax = "proto3";

package myapp;

message User {
  int64 id = 1;
  string name = 2;
  string email = 3;
  repeated string tags = 4;
  Address address = 5;
}

message Address {
  string street = 1;
  string city = 2;
}

2.2 Meaning of Field Numbers

Field numbers are the binary format's identifier. Renames stay compatible if numbers remain. Deleted field numbers must never be reused.

// v1
message User {
  string user_name = 1;
}

// v2 (compatible)
message User {
  string display_name = 1;  // renamed, same number
}

2.3 Wire Format

Each field: (field_number << 3) | wire_type.

Wire Types:

  • 0: VARINT (int32, int64, bool)
  • 1: FIXED64 (double, fixed64)
  • 2: LENGTH_DELIMITED (string, bytes, message)
  • 5: FIXED32 (float, fixed32)

2.4 Varint Encoding

Small numbers use fewer bytes:

0      -> 1 byte
127    -> 1 byte
128    -> 2 bytes
16383  -> 2 bytes
16384  -> 3 bytes

Small IDs and counters are extremely efficient.

2.5 Size Comparison

// JSON: 78 bytes
{"id":123,"name":"Alice","email":"alice@example.com"}

Protobuf encodes the same data in ~35 bytes (45% smaller). Larger messages widen the gap because keys aren't repeated.

2.6 Schema Evolution

Rules:

  • OK: add new fields (with new numbers)
  • OK: rename fields
  • NG: change field numbers
  • NG: change field types
  • NG: delete and reuse numbers (use reserved)
message User {
  reserved 4, 5;
  reserved "old_field";

  int64 id = 1;
  string name = 2;
  string email = 3;
}

3. Four Communication Models

3.1 Unary RPC

rpc GetUser(GetUserRequest) returns (User);

1 request -> 1 response. Similar to REST.

response = stub.GetUser(GetUserRequest(user_id=123))

Use for CRUD and general APIs.

3.2 Server Streaming

rpc StreamUsers(Empty) returns (stream User);

1 request -> N responses.

for user in stub.StreamUsers(Empty()):
    print(user.name)

Use for large result sets, real-time updates, log streaming.

3.3 Client Streaming

rpc UploadEvents(stream Event) returns (UploadResponse);

N requests -> 1 response.

def event_generator():
    for event in events:
        yield event

response = stub.UploadEvents(event_generator())

Use for bulk uploads and batch processing.

3.4 Bidirectional Streaming

rpc Chat(stream ChatMessage) returns (stream ChatMessage);

N <-> N. Both sides send messages independently.

def Chat(self, request_iterator, context):
    for msg in request_iterator:
        yield ChatMessage(text=f"Echo: {msg.text}")

Use for chat, real-time gaming, IoT two-way communication.

3.5 Comparison

ModelReqRespUse Case
Unary11CRUD
Server Streaming1NLarge results, live updates
Client StreamingN1Uploads, batch
BidirectionalNNChat, gaming

4. The Role of HTTP/2

4.1 HTTP/1.1 Limits

  • Each request needs a new connection (or sequential keep-alive)
  • Head-of-line blocking
  • Repeated headers

4.2 HTTP/2 Multiplexing

A single TCP connection carries many streams concurrently. Plus HPACK header compression, binary framing, server push.

4.3 How gRPC Uses HTTP/2

  • Each RPC = 1 stream
  • Thousands of RPCs on one connection
  • Minimal TCP overhead
  • Bidirectional streaming comes naturally

4.4 Header Compression

HPACK uses a static table, dynamic table, and Huffman encoding, cutting header size by 80%+.


5. Interceptors

5.1 What Are They?

Interceptors are middleware that runs before/after RPC calls. They handle cross-cutting concerns: auth, logging, metrics, tracing, error handling, retries.

5.2 Server Interceptor (Python)

import grpc

class AuthInterceptor(grpc.ServerInterceptor):
    def intercept_service(self, continuation, handler_call_details):
        metadata = dict(handler_call_details.invocation_metadata)
        token = metadata.get('authorization')

        if not verify_token(token):
            return grpc.unary_unary_rpc_method_handler(
                lambda req, ctx: ctx.abort(grpc.StatusCode.UNAUTHENTICATED, 'Invalid token')
            )

        return continuation(handler_call_details)

server = grpc.server(
    futures.ThreadPoolExecutor(max_workers=10),
    interceptors=[AuthInterceptor()]
)

5.3 Client Interceptor

class RetryInterceptor(grpc.UnaryUnaryClientInterceptor):
    def intercept_unary_unary(self, continuation, client_call_details, request):
        for attempt in range(3):
            try:
                return continuation(client_call_details, request)
            except grpc.RpcError as e:
                if e.code() != grpc.StatusCode.UNAVAILABLE:
                    raise
                time.sleep(2 ** attempt)
        raise

5.4 Go Interceptor

func loggingInterceptor(
    ctx context.Context,
    req interface{},
    info *grpc.UnaryServerInfo,
    handler grpc.UnaryHandler,
) (interface{}, error) {
    start := time.Now()
    resp, err := handler(ctx, req)
    log.Printf("%s took %v", info.FullMethod, time.Since(start))
    return resp, err
}

5.5 Chained Interceptors

Executed in order, e.g. Tracing -> Auth -> Logging -> Metrics.


6. Deadline and Cancellation

6.1 Why Deadlines Matter

# Wrong - waits forever
response = stub.GetUser(GetUserRequest(user_id=123))

# Right - 5s deadline
response = stub.GetUser(GetUserRequest(user_id=123), timeout=5.0)

Without deadlines: network issues cause infinite waits, thread exhaustion, poor UX.

6.2 Deadline Propagation

Client (5s) -> Service A (takes 4s) -> Service B (only 1s remaining). gRPC propagates the deadline through the call chain, preventing timeout accumulation.

def call_chain(context):
    response = downstream_stub.SomeRPC(request, timeout=context.time_remaining())

6.3 Cancellation

future = stub.GetUser.future(request)
time.sleep(2)
future.cancel()  # cancel signal propagates to server

Server side checks context.is_active() or context.cancelled().

6.4 Context in Go

ctx, cancel := context.WithTimeout(context.Background(), 5*time.Second)
defer cancel()

response, err := client.GetUser(ctx, &GetUserRequest{UserId: 123})

Context carries deadline, cancellation, and metadata (auth tokens).


7. Load Balancing

7.1 Client-Side LB

channel = grpc.insecure_channel(
    'dns:///my-service:50051',
    options=[('grpc.lb_policy_name', 'round_robin')]
)

Policies: pick_first (default), round_robin, custom.

7.2 Look-Aside LB

Client asks an LB service for available servers. Example: gRPC + xDS (Envoy).

7.3 Proxy LB

Client -> Envoy proxy -> servers. Examples: Envoy, Linkerd, Istio. Client sees a single endpoint; all LB logic lives in the proxy.

7.4 Health Checks

syntax = "proto3";

package grpc.health.v1;

service Health {
  rpc Check(HealthCheckRequest) returns (HealthCheckResponse);
  rpc Watch(HealthCheckRequest) returns (stream HealthCheckResponse);
}

message HealthCheckResponse {
  enum ServingStatus {
    UNKNOWN = 0;
    SERVING = 1;
    NOT_SERVING = 2;
  }
  ServingStatus status = 1;
}

Standard gRPC Health Check Protocol, implementable by every gRPC server.


8. Performance Tuning

8.1 Message Size Limits

Default 4MB. Larger messages are rejected.

options = [
    ('grpc.max_send_message_length', 100 * 1024 * 1024),
    ('grpc.max_receive_message_length', 100 * 1024 * 1024),
]

For large data, use streaming instead of a single message.

8.2 Connection Pooling

Reuse channels. Don't create a new channel per request. HTTP/2 multiplexing handles concurrency.

8.3 KeepAlive

options = [
    ('grpc.keepalive_time_ms', 10000),
    ('grpc.keepalive_timeout_ms', 5000),
    ('grpc.keepalive_permit_without_calls', True),
    ('grpc.http2.max_pings_without_data', 0),
]

Detects dead connections, prevents NAT timeouts.

8.4 Compression

server = grpc.server(..., compression=grpc.Compression.Gzip)

channel = grpc.insecure_channel('localhost:50051', compression=grpc.Compression.Gzip)

Algorithms: gzip, deflate, snappy. Effective on large messages; overhead on tiny ones.


9. gRPC-Web — Browser Support

9.1 The Problem

Browsers don't expose HTTP/2 trailers or raw HTTP/2 streams, so native gRPC isn't usable from JS.

9.2 gRPC-Web

Browser-friendly variant: HTTP/1.1 or HTTP/2, trailers encoded in body, CORS supported.

[Browser] -- gRPC-Web --> [Envoy Proxy] -- gRPC --> [Server]

9.3 Usage

import { UserServiceClient } from './generated/user_pb_service'

const client = new UserServiceClient('https://api.example.com')

client.getUser(new GetUserRequest().setUserId(123), (err, response) => {
    if (err) console.error(err)
    else console.log(response.getName())
})

9.4 Limitations

  • Client streaming not supported (in most implementations)
  • Bidirectional streaming not supported
  • Slightly larger payload (base64)

9.5 Connect

Buf's Connect is the successor. Supports HTTP/1.1, HTTP/2, gRPC, gRPC-Web with a simpler API and TypeScript-first DX.

import { createPromiseClient } from "@bufbuild/connect"
import { UserService } from "./gen/user_connect"

const client = createPromiseClient(UserService, transport)
const response = await client.getUser({ userId: 123 })

10. Debugging and Tooling

10.1 grpcurl

curl for gRPC:

grpcurl -plaintext localhost:50051 list

grpcurl -plaintext -d '{"user_id": 123}' \
  localhost:50051 \
  UserService/GetUser

Requires reflection:

from grpc_reflection.v1alpha import reflection
SERVICE_NAMES = (UserService_pb2.DESCRIPTOR.services_by_name['UserService'].full_name, reflection.SERVICE_NAME)
reflection.enable_server_reflection(SERVICE_NAMES, server)

10.2 BloomRPC / Postman

GUI clients.

10.3 Logging

import logging
logging.basicConfig(level=logging.DEBUG)
os.environ['GRPC_VERBOSITY'] = 'DEBUG'
os.environ['GRPC_TRACE'] = 'all'

10.4 Distributed Tracing

from opentelemetry.instrumentation.grpc import GrpcInstrumentorServer
GrpcInstrumentorServer().instrument()

All gRPC calls are traced automatically.


Quiz

1. Why do Protobuf field numbers matter?

Field numbers are the binary identifier. Unlike JSON, keys aren't sent every time — just a small integer. They're also the core of schema evolution: names can change but numbers can't. Numbers of deleted fields must never be reused (use reserved). 1-15 encode in 1 byte, 16-2047 in 2 — so assign frequent fields to 1-15.

2. Why is gRPC faster than REST?

Four factors: (1) Protobuf is 5-10x smaller and faster to parse; (2) HTTP/2 multiplexing; (3) HPACK saves 80%+ on headers; (4) binary framing avoids text parsing. Result: 5-10x throughput, half the latency. Downsides: harder to debug, no native browser support (needs gRPC-Web).

3. When to use bidirectional streaming?

When both sides send messages independently: chat, real-time games, IoT two-way messaging, speech recognition (audio in, transcript out). Similar to WebSocket but with strong typing and Protobuf efficiency.

4. Why is deadline propagation important?

It prevents cumulative timeouts across the call chain. Example: Client 5s -> A takes 4s -> calling B with 5s deadline allows 9s total. Correct behavior: A calls B with remaining time (1s). gRPC propagates deadlines via Context automatically. All downstream calls inherit the parent deadline. Go: context.Context. Python: context.time_remaining().

5. gRPC-Web vs normal gRPC?

Browsers can't use HTTP/2 trailers or raw streams. gRPC-Web uses HTTP/1.1 or HTTP/2, encodes trailers in the body, supports CORS. Limitations: no client streaming, no bidirectional streaming (server streaming works). Typically an Envoy proxy converts gRPC-Web to gRPC. Connect (by Buf) is a cleaner successor.


References