What is Latency?

Latency is the time between sending a request and receiving the first response byte. Distinct from throughput. Reported as p50, p95, p99 percentiles.

What is latency?

Latency is the time between when a client sends a request and when it receives the first response byte. In a web context that means: user clicks, browser sends an HTTP request, server processes it, response starts arriving. The clock-time of that round-trip is latency, usually expressed in milliseconds.

The end-to-end number hides a stack of contributing pieces: DNS lookup, TCP handshake, TLS handshake, server-side processing (application code, database query, downstream API call), response transmission. A 600 ms latency might be 50 ms of DNS, 80 ms of TLS, 400 ms of server processing, and 70 ms of network return. Finding the slow piece is the work of performance engineering.

Latency vs throughput

Two metrics, often confused, that measure different things:

  • Latency: how long a single request takes. Measured in milliseconds. Improving latency makes individual requests faster.
  • Throughput: how many requests the system processes per unit time. Measured in RPS (requests per second). Improving throughput means the system serves more users without slowing down.

They are independent. A system can have low latency and low throughput (fast for one user, falls over at 100 users). It can have high latency and high throughput (a batch pipeline that processes millions of records per second but takes minutes per record). Production systems need both.

Latency and throughput trade off under load. Add more concurrent users to a system at fixed capacity and latency climbs as resources become contended. The shape of that climb is what load testing, capacity testing, and scalability testing measure.

Why averages are misleading

The average latency hides the tail. If 99 requests return in 100 ms and one takes 30 seconds, the average is 400 ms, which sounds fine but maps to a real user waiting 30 seconds. Percentiles fix this:

  • p50 (median): half of requests are faster than this. Useful as a sanity check, useless for SLOs.
  • p95: 95% of requests are faster than this. The standard latency SLO threshold.
  • p99: 99% of requests are faster. Catches the tail of slow outliers that p95 hides.
  • p99.9 / max: the worst 0.1% of requests. Often dominated by GC pauses, lock contention, cold-cache misses. Matters for high-traffic systems where 0.1% is millions of users.

Every credible load-test report shows percentiles, never just averages.

Sources of latency

  1. Network. Geographic distance, DNS resolution time, TCP / TLS handshake. A 100 ms RTT from US to Europe is physics; the round-trip can't be faster than light over the wire. CDN edges reduce this by serving from a closer location.
  2. Server-side processing. Application code, database queries, downstream API calls. Usually the largest fraction. Profile to find the slow piece.
  3. Garbage collection / runtime pauses. JVM GC pauses, Go STW pauses, Python GIL contention. Visible in p99+ tails as bursts of slow requests.
  4. Resource contention. Database connection-pool waits, lock contention, queue backlogs. Latency climbs sharply when contention hits saturation; this is the elbow on the load-test curve.
  5. Client-side rendering. Browser TTI (time to interactive) includes JS parse, hydration, render. Not latency in the API sense but matters for user-perceived speed.

How to measure latency

Latency is reported by every load testing tool. In k6, the built-in http_req_duration metric reports p50, p95, p99 by default; add thresholds: { http_req_duration: ['p(95)<500'] } to fail the test if p95 exceeds 500 ms. In JMeter, the Aggregate Report listener shows average, median, p90, p95, p99.

For production latency, instrument the server with APM (Datadog, New Relic, Honeycomb) and the browser with RUM (real-user monitoring). Real-user latency includes the long tail of slow networks, mobile hardware, and shared TLS termination that synthetic load tests don't see.

Run latency-focused tests from LoadFocus against 25+ cloud regions to see how latency differs by geography. A 50 ms p95 from us-east-1 is irrelevant if your users are in Sydney and the real p95 from there is 600 ms.

If your team needs latency analysis broken down by endpoint, region, and time-of-day against production-shape load, LoadFocus offers load testing services where engineers design the test matrix, run the scenarios, and write up the latency breakdown.

How fast is your website?

Elevate its speed and SEO seamlessly with our Free Speed Test.

Free Website Speed Test

Analyze your website's load speed and improve its performance with our free page speed checker.

×