Skip to content
🎯 New workshop: Govern AI Costs in Real Time — Hands-On with agentgateway agentgateway has joined the Agentic AI FoundationLearn more

Benchmarking Agentgateway vs LiteLLM Part 2: Fixed Throughput

A fixed-throughput follow-up benchmark comparing agentgateway and LiteLLM at 3,000 QPS on latency, CPU, and memory using Fortio and a mock LLM backend.

Lin Sun 3 min read
Back to blog

In Part 1, I pushed both agentgateway and LiteLLM to their maximum throughput and compared latency, CPU, and memory usage. The downside of that benchmark was that each gateway was processing a very different workload. agentgateway handled roughly 10× the requests of LiteLLM so it wasn’t exactly an apples-to-apples comparison.

In this post, I use a fixed target throughput of 3,000 QPS for both gateways. This allows me to compare how efficiently each proxy handles the same traffic level.


Test setup

The benchmark uses a very simple architecture. A mock LLM server immediately returns a fixed response so the benchmark measures proxy overhead rather than model inference time.

Fortio generates traffic against each gateway at the configured rate.

fortio (bt) ──► litellm :4000 ───────┐
                                     ├──► mock-server (hyper-server) :8081
fortio (bt) ──► agentgateway :4001 ──┘

I ran the benchmark with:

./scripts/run-benchmark.sh -q 3000 -d 30

The benchmark uses:

  • 32 concurrent connections
  • Target throughput: 3,000 QPS
  • 1 KB request payloads
  • 30-second benchmark duration

Results

Throughput & Latency

GatewayActual ThroughputP50P90P99
agentgateway2998.94 QPS0.227 ms0.249 ms0.436 ms
LiteLLM2465.89 QPS12.318 ms19.739 ms30.626 ms

A few observations immediately stand out.

First, agentgateway sustained almost exactly the requested throughput of 3,000 QPS, processing 89,984 successful requests over the 30-second benchmark.

LiteLLM never reached the target rate. It averaged 2,466 QPS, about 18% below the requested throughput, completing 74,008 requests during the same period.

Latency was also dramatically different. agentgateway maintained a P99 latency below half a millisecond, while LiteLLM’s P99 exceeded 30 ms. Even median latency (P50) was over 50× lower with agentgateway.

CPU & Memory

GatewayAvg CPUPeak CPUAvg MemoryPeak Memory
agentgateway13.4%29.5%13 MiB17 MiB
LiteLLM345.5%1158.5%11.67 GiB11.69 GiB

The resource utilization is arguably even more interesting than the latency numbers.

At essentially the same request rate, agentgateway used only 13% average CPU, while LiteLLM averaged 345% CPU, roughly 26× higher.

Memory usage showed an even larger gap. agentgateway stayed around 13 MB throughout the test, whereas LiteLLM consumed nearly 12 GB of RAM.

This means agentgateway handled a higher request rate while using only a tiny fraction of the system resources.


Raw benchmark output

./scripts/run-benchmark.sh -q 3000 -d 30
==> Run ID: 20260626-165414
==> LiteLLM workers: 18

Running fortio to litellm at 3000 QPS for 30s and 32 connections...
qps: 2465.89qps    p50: 12.318ms    p90: 19.739ms    p99: 30.626ms

Running fortio to agentgateway at 3000 QPS for 30s and 32 connections...
qps: 2998.94qps    p50: 0.227ms    p90: 0.249ms    p99: 0.436ms

DEST,CLIENT,QPS,CONS,DUR,PAYLOAD,SUCCESS,THROUGHPUT,P50,P90,P99
litellm,fortio,3000,32,30,1104,74008,2465.89qps,12.318ms,19.739ms,30.626ms
agentgateway,fortio,3000,32,30,1104,89984,2998.94qps,0.227ms,0.249ms,0.436ms

CONTAINER,SAMPLES,AVG_CPU%,PEAK_CPU%,AVG_MEM,PEAK_MEM
perf-agentgateway,21,13.42%,29.47%,13.15MiB,17.07MiB
perf-litellm,21,345.51%,1158.47%,11.67GiB,11.69GiB
perf-mock-server,21,6.23%,8.59%,3.06MiB,3.17MiB

Visualized results

I asked Cursor to turn the raw benchmark data into a few charts:


Takeaways

Compared with the “maximum throughput” benchmark in Part 1, this test removes one important variable by targeting the same request rate for both gateways.

Even under this controlled workload:

  • agentgateway sustained the full 3,000 QPS target, while LiteLLM averaged 2,466 QPS.
  • P99 latency was under 0.5 ms for agentgateway versus over 30 ms for LiteLLM.
  • agentgateway used approximately 26× less CPU on average.
  • Memory usage remained around 13 MB, compared to nearly 12 GB for LiteLLM.

Like the first benchmark, this test intentionally isolates proxy overhead by using a mock backend. It does not evaluate model inference latency or compare gateway features. If your application spends hundreds of milliseconds waiting for an LLM response, proxy latency becomes less significant.

However, if you’re building high-throughput AI infrastructure, serving many concurrent requests, or simply want a lightweight local gateway to manage all of your LLM providers, proxy efficiency matters. In this benchmark, agentgateway consistently delivered lower latency while using substantially fewer CPU and memory resources.

The complete benchmark scripts and raw results are available in the GitHub repository if you’d like to reproduce the numbers yourself.