7 minutes read

Throughput is the metric that tells you how much work a system actually does per unit of time. It’s the rate measurement that complements latency (which measures how long individual requests take). Together they tell you whether a system is fast (low latency) AND productive (high throughput).

This post covers what throughput means in performance testing, the difference between throughput and adjacent metrics (response time, RPS, TPS, VUs), how to measure it, what bottlenecks limit it, and how to read throughput numbers from your test reports.

Is Your Infrastructure Ready for Global Traffic Spikes?

Unexpected load surges can disrupt your services. With LoadFocus’s cutting-edge Load Testing solutions, simulate real-world traffic from multiple global locations in a single test. Our advanced engine dynamically upscales and downscales virtual users in real time, delivering comprehensive reports that empower you to identify and resolve performance bottlenecks before they affect your users.

View Pricing
Real-time insights
Discover More
Global scalability

The quick answer

Throughput = work completed per unit time. Usually expressed as requests per second (RPS), transactions per second (TPS), or bytes per second for data-heavy workloads.

MetricWhat it measuresTypical unit
ThroughputHow much work the system completesrequests/second, transactions/second, MB/second
Response timeHow long an individual request takesmilliseconds
LatencyNetwork round-trip portion of response timemilliseconds
Concurrent users (VUs)How many requests are in flight simultaneouslycount
Error ratePercentage of failed requests%

A system can have high throughput AND high response time (lots of requests completing, but each one slow). It can also have low throughput AND low response time (fast individual requests, but the system isn’t busy). Throughput and response time are independent dimensions.

What is throughput in performance testing?

Throughput is a count divided by a time window. The most common form in web performance testing is requests per second (RPS): the number of HTTP requests the system completes in a given second, averaged over the measurement window.

Think your website can handle a traffic spike?

Fair enough, but why leave it to chance? Uncover your website’s true limits with LoadFocus’s cloud-based Load Testing for Web Apps, Websites, and APIs. Avoid the risk of costly downtimes and missed opportunities—find out before your users do!

Effortless setup No coding required

A typical performance test reports throughput in several ways:

  • Total throughput. Total requests / total test duration.
  • Sustained throughput. Average RPS during the steady-state portion of the test (after ramp-up, before ramp-down).
  • Peak throughput. Highest 1-second window of the test.
  • Throughput per endpoint. Same metric, broken down by which API endpoint or page handled the request.

You’ll see all four in a JMeter or k6 test report. The most useful for capacity decisions is sustained throughput, because peak throughput can be misleading (a 5-second spike of 5,000 RPS doesn’t prove the system can hold 5,000 RPS for an hour).

Throughput vs response time vs latency

These three are easy to conflate. They measure different things:

  • Throughput = how much the system completes in aggregate (per second).
  • Response time = how long each individual request takes from send to receive (per request).
  • Latency = the network portion of response time (TCP round-trip + TLS handshake + propagation delay).

The relationships:

LoadFocus is an all-in-one Cloud Testing Platform for Websites and APIs for Load Testing, Apache JMeter Load Testing, Page Speed Monitoring and API Monitoring!

Effortless setup No coding required
  • Throughput is bounded by response time × concurrency. If each request takes 100 ms and you have 10 concurrent VUs, max throughput is 10 / 0.1 = 100 RPS.
  • As you add load (more VUs), throughput increases linearly until a bottleneck appears. After that, additional VUs increase response time without increasing throughput.
  • Latency (network-only) usually doesn’t change with load unless the network itself saturates. Response time is what climbs under load. Server processing time grows, queue depths grow, GC pauses grow.

The shape of throughput vs concurrency is the most important graph in performance testing. A linear region followed by a sharp elbow tells you exactly where the system hit its breakpoint.

RPS vs TPS vs VUs

These get used interchangeably and shouldn’t be:

  • RPS (Requests Per Second). Number of HTTP requests the system completes per second. Lower-level metric. Counts every GET, POST, asset load.
  • TPS (Transactions Per Second). Number of business-level operations the system completes per second. A “checkout” transaction might involve 5 RPS (auth, cart, payment, confirm, email-trigger).
  • VUs (Virtual Users / Concurrent Users). Number of simulated users active simultaneously, not a throughput measurement at all. VUs is the LOAD; RPS/TPS is the WORK that load produces.

When someone says “we need to handle 1,000 users”, clarify: 1,000 concurrent VUs? Or 1,000 transactions per second? They’re different problems with different capacity implications.

Rule of thumb: 1 VU usually produces something like 1-10 RPS depending on think time. A VU with realistic think time (3 seconds between requests) produces ~0.3 RPS. A VU without think time can produce 10-100+ RPS until it saturates a client-side resource.

What limits throughput

Throughput hits a ceiling when one of these resources saturates first:

  1. CPU on the application server. Most common bottleneck for compute-heavy workloads. Solution: scale horizontally (more nodes) or vertically (bigger instances).
  2. Database connection pool. Common when each request opens its own connection. Solution: tune the pool size, add read replicas, use connection pooling middleware like PgBouncer.
  3. Database CPU or I/O. When the database itself is the bottleneck. Solution: query optimisation, indexes, read replicas, sharding.
  4. Network bandwidth. Rare on cloud infrastructure but real for very high RPS or large payloads. Solution: bigger network instances, CDN offloading, response compression.
  5. Thread pool / worker pool. Common in synchronous frameworks. Solution: increase pool size or switch to async I/O.
  6. External dependencies. Third-party APIs you call per request. Solution: caching, circuit breakers, async patterns.
  7. Memory pressure leading to GC pauses. Solution: heap tuning, leak fixes, more memory.

The point of a load test is to make this curve visible. The point of a capacity test is to find the maximum sustainable throughput at acceptable SLOs.

How to measure throughput

JMeter’s Aggregate Report, Summary Report, and Backend Listener all report throughput in requests/minute (which you divide by 60 to get RPS). The metric is calculated as: total samples / elapsed time of the test.

k6’s built-in metric http_reqs reports total requests; combined with test duration this gives you average RPS. The http_req_rate metric (often used in thresholds) is RPS directly.

Server-side, your load balancer or APM tool also reports throughput. Useful for cross-checking the test harness against what the system actually saw.

For more on the tools, see JMeter load testing and k6 load testing.

Reading throughput in a test report

Three patterns to look for:

1. Linear scaling region

VUs vs throughput plotted as a line that grows proportionally. Adding 10% more VUs gives ~10% more RPS. This is healthy. The system has headroom.

2. The elbow

A point where the line flattens. Adding more VUs no longer increases throughput. This is the breakpoint. The system has hit its capacity ceiling. The next thing to climb is response time (queueing).

3. The cliff

A point where throughput drops as you add load. This means the system is in a degraded state. Usually because failures are consuming resources (timeouts holding connections, retry storms, GC pauses cascading). Recovery is unlikely without removing the load.

What “good” throughput looks like

There’s no universal number. “Good” depends on the system and the workload:

  • A simple static API endpoint on a cloud instance might handle 5,000-20,000 RPS.
  • A complex authenticated API with database writes might handle 100-1,000 RPS per node.
  • A page-render endpoint that loads templates + queries DB + renders HTML might handle 50-500 RPS per node.

The relevant question isn’t “is 200 RPS good?” but “does my system meet its SLO at the load I forecast?” If you forecast 100 RPS and you hit p95 < 1s with margin to spare at 200 RPS, you're fine. If you forecast 500 RPS and hit p95 = 3s at 400 RPS, you have a problem.

Throughput across multiple regions

When you run load tests from a single region against a service with users in multiple regions, the throughput numbers you measure are local to that region’s network path. They don’t tell you what throughput looks like for users in other geos.

For honest throughput numbers, run distributed load tests from multiple regions simultaneously. LoadFocus handles distributed test execution across 25+ AWS regions, with throughput reported per region and aggregated. Useful for surfacing per-region capacity differences (CDN edge issues, per-region rate limits, geo-specific origin overloads).

Distributed throughput testing also validates scalability claims. Adding 2× regions should give close to 2× total throughput if the architecture scales horizontally.

Frequently asked questions

What does “throughput” mean in software testing?

Throughput is the rate at which a system completes work, usually measured in requests per second (RPS) or transactions per second (TPS). It’s the productivity metric, complementary to response time which measures how long each individual unit of work takes.

Is throughput the same as RPS?

In web performance testing, throughput is most commonly expressed as RPS (requests per second). But “throughput” is the broader concept. It can be measured as transactions/second, bytes/second, or any work-per-time unit depending on the workload.

What’s the difference between throughput and latency?

Throughput is how much work the system completes per unit time. Latency is the network round-trip portion of response time for an individual request. Both matter for user experience. High throughput with high latency means users wait but the system serves many of them.

What’s the difference between throughput and response time?

Throughput is requests-per-second aggregate. Response time is per-request duration. A system can have high throughput AND high response time if it’s processing many slow requests in parallel. They’re independent dimensions.

How do I increase throughput?

The lever depends on what’s saturating: scale horizontally for CPU bottlenecks, tune connection pools for DB connection exhaustion, add caching for read-heavy workloads, switch to async I/O for thread-pool saturation, increase instance size for memory pressure. A stress test tells you which one.

What’s a good throughput number?

There’s no universal answer. The right framing is: does throughput meet your SLO at the load you forecast? A simple static endpoint might hit 20,000 RPS per node; a complex authenticated API might hit 200 RPS. Both can be “good” relative to their respective workloads.

How is throughput measured in JMeter?

JMeter reports throughput in requests/minute via the Aggregate Report, Summary Report, or Backend Listener. Divide by 60 for RPS. The Backend Listener can stream throughput to InfluxDB + Grafana for real-time monitoring during the run.

How is throughput measured in k6?

k6 reports throughput via the built-in http_reqs counter (total requests) and http_req_rate (RPS). These appear in the end-of-test summary and stream to any configured output (Prometheus, InfluxDB, DataDog, etc.).

Can throughput decrease as I add load?

Yes. And it’s a bad sign. A decreasing throughput curve under load means the system is in cascading failure: timeouts holding connections, retry storms, GC pauses growing. This is the “cliff” pattern, distinct from the more common “elbow” (throughput plateaus) or “linear” (throughput scales) patterns.

Bottom line

Throughput is the productivity metric in performance testing. How much work the system actually does per second. Pair it with response time percentiles (p50, p95, p99) to know both how busy the system is and how fast individual users experience it.

For load tests that measure throughput at expected peak, LoadFocus handles JMeter and k6 from 25+ cloud regions with throughput reporting per region and per endpoint. For stress tests that map throughput-vs-load curves, the same infrastructure runs the stepped ramps that surface the elbow and the cliff.

If you’d rather hand off the throughput analysis (especially when capacity decisions depend on it), LoadFocus offers load testing services where engineers run the tests and write up the throughput curve with bottleneck attribution.

How fast is your website? Free Website Speed Test