API Performance Metrics: Latency, Throughput, Error Rate

API performance metrics track speed, capacity, reliability — latency p50/p95/p99, throughput (RPS), error rate, saturation. Foundation for SLOs.

What are API performance metrics?

API performance metrics are quantitative measures of how an API behaves under real or simulated load. They answer questions like: How fast does it respond? How much traffic can it handle? How often does it fail? Together, these metrics form the basis of SLAs (Service Level Agreements), SLOs (Service Level Objectives), and capacity planning.

Modern API observability uses the "four golden signals" (Google SRE) plus business-specific KPIs. Without these metrics, you're guessing whether your API is healthy.

The four golden signals

SignalWhat it measuresExample
LatencyTime per requestp95 = 250ms
Throughput (Traffic)Requests per unit time1,500 RPS
ErrorsFailed request rate0.3% 5xx errors
SaturationHow "full" the system isCPU 80%, queue depth 200

Latency: percentiles, not averages

Average latency is misleading — a few slow requests skew it. Always report percentiles:

PercentileWhat it tells you
p50 (median)Typical request
p955% of users see this or worse
p991% see this or worse
p99.90.1% — the worst experiences
MaxWorst single request (often spurious)

Most SLOs target p95 or p99 — "99% of requests under 500ms."

Throughput: requests per second (RPS)

Number of requests processed per unit time. Related metrics:

  • RPS — requests per second
  • QPS — queries per second (similar)
  • Concurrent users / VUs — simultaneous active users
  • Bandwidth — data transferred per second

Throughput peaks reveal capacity. Find the "knee" where latency starts climbing — that's your max sustainable RPS.

Error rate

Percentage of requests that fail. Categorize:

  • 5xx errors — server faults (your problem)
  • 4xx errors — client errors (often valid; e.g., 404, 401)
  • Timeouts — request never completed
  • Connection errors — couldn't connect at all

Common SLO: "< 0.1% 5xx errors over 30 days."

Saturation

How close to capacity the system is. High saturation = imminent failure. Track:

  • CPU utilization
  • Memory usage
  • Disk I/O
  • Network bandwidth
  • Queue depth (e.g., DB connection pool, request queue)
  • Open file descriptors
  • Thread/connection counts

Application-specific metrics

MetricWhat it tells you
TTFB (Time to First Byte)Server response time before payload
Total response timeEnd-to-end latency including transfer
DNS lookup timeNetwork resolution overhead
Connection timeTCP/TLS handshake time
Database query timeHow much latency is DB
Apdex score0-1 score weighted by "satisfaction"
Conversion rateBusiness outcome correlated with perf

SLI / SLO / SLA

TermMeaningExample
SLI (Indicator)The metric itselfp95 latency
SLO (Objective)Internal target for SLIp95 < 500ms over 30d
SLA (Agreement)Customer-facing contract99.9% uptime; refund if violated
Error budgetHow much you can fail and still hit SLO43m/month at 99.9%

How to measure API performance

Synthetic / load testing

Tools: JMeter, k6, Locust, Gatling. Run before deploys + scheduled. Reproducible baseline.

Real User Monitoring (RUM)

Capture metrics from actual user sessions in production. Tools: Datadog, New Relic, Sentry.

APM (Application Performance Monitoring)

Server-side instrumentation. Tools: Datadog APM, New Relic APM, Dynatrace, OpenTelemetry.

Logs + metrics + traces

OpenTelemetry standard. Logs (events), metrics (numbers), traces (request flow).

API performance best practices

  • Measure, don't guess. Instrumentation first.
  • Track percentiles. p95, p99 — not averages.
  • Define SLOs. Concrete targets keep teams honest.
  • Alert on burn rate. Early warning before SLO blown.
  • Test at higher load than expected. 2-3× peak.
  • Monitor saturation, not just latency. Saturation predicts; latency reports.
  • Tag by endpoint + version. Aggregate hides issues.
  • Slice by region, browser, device. Issues often concentrated.
  • Continuous load testing in CI. Catch regressions early.

Common pitfalls

  • Reporting averages. Hides outliers; p99 is what kills users.
  • Only measuring in staging. Production load differs.
  • No SLO discipline. Metrics ignored without targets.
  • Alerting on all 5xx. Noise; alert on rate change instead.
  • Single-tool reliance. APM + load testing + RUM each see different angles.
  • Performance "tested" once. Re-test after every major change.
  • Ignoring tail latency. p99.9 matters for users with bad luck.

FAQ: API performance metrics

What's a good API latency?

Depends. Web APIs: p95 < 500ms is healthy. Real-time: < 100ms. Internal APIs can be tighter. Compare to user expectation + competitors.

How do I find my max throughput?

Load test increasing RPS until latency degrades or errors spike. The "knee" is your max.

What's an acceptable error rate?

Most SLOs: < 0.1% 5xx. Higher acceptable for some non-critical endpoints. Critical paths (checkout, login) should be < 0.01%.

p95 vs p99: which to track?

Both. p95 = quality for typical users; p99 = quality for unlucky users. Most SLOs target both.

How is throughput related to capacity?

Capacity is max sustainable throughput before degradation. Find via load testing.

What's an error budget?

The amount of unreliability allowed by an SLO. 99.9% = 43m/month error budget. Burn through fast = freeze releases.

How often should I load test?

Continuously in CI for critical changes. Periodic full load tests (monthly/quarterly).

Measure API performance with LoadFocus

LoadFocus runs JMeter and k6 scripts from 25+ regions, capturing latency percentiles, throughput, and error rates. Sign up free at loadfocus.com/signup.

How fast is your website?

Elevate its speed and SEO seamlessly with our Free Speed Test.

Free Website Speed Test

Analyze your website's load speed and improve its performance with our free page speed checker.

×