API Performance Metrics: Latency, Throughput, Error Rate
API performance metrics track speed, capacity, reliability — latency p50/p95/p99, throughput (RPS), error rate, saturation. Foundation for SLOs.
What are API performance metrics?
API performance metrics are quantitative measures of how an API behaves under real or simulated load. They answer questions like: How fast does it respond? How much traffic can it handle? How often does it fail? Together, these metrics form the basis of SLAs (Service Level Agreements), SLOs (Service Level Objectives), and capacity planning.
Modern API observability uses the "four golden signals" (Google SRE) plus business-specific KPIs. Without these metrics, you're guessing whether your API is healthy.
The four golden signals
| Signal | What it measures | Example |
|---|---|---|
| Latency | Time per request | p95 = 250ms |
| Throughput (Traffic) | Requests per unit time | 1,500 RPS |
| Errors | Failed request rate | 0.3% 5xx errors |
| Saturation | How "full" the system is | CPU 80%, queue depth 200 |
Latency: percentiles, not averages
Average latency is misleading — a few slow requests skew it. Always report percentiles:
| Percentile | What it tells you |
|---|---|
| p50 (median) | Typical request |
| p95 | 5% of users see this or worse |
| p99 | 1% see this or worse |
| p99.9 | 0.1% — the worst experiences |
| Max | Worst single request (often spurious) |
Most SLOs target p95 or p99 — "99% of requests under 500ms."
Throughput: requests per second (RPS)
Number of requests processed per unit time. Related metrics:
- RPS — requests per second
- QPS — queries per second (similar)
- Concurrent users / VUs — simultaneous active users
- Bandwidth — data transferred per second
Throughput peaks reveal capacity. Find the "knee" where latency starts climbing — that's your max sustainable RPS.
Error rate
Percentage of requests that fail. Categorize:
- 5xx errors — server faults (your problem)
- 4xx errors — client errors (often valid; e.g., 404, 401)
- Timeouts — request never completed
- Connection errors — couldn't connect at all
Common SLO: "< 0.1% 5xx errors over 30 days."
Saturation
How close to capacity the system is. High saturation = imminent failure. Track:
- CPU utilization
- Memory usage
- Disk I/O
- Network bandwidth
- Queue depth (e.g., DB connection pool, request queue)
- Open file descriptors
- Thread/connection counts
Application-specific metrics
| Metric | What it tells you |
|---|---|
| TTFB (Time to First Byte) | Server response time before payload |
| Total response time | End-to-end latency including transfer |
| DNS lookup time | Network resolution overhead |
| Connection time | TCP/TLS handshake time |
| Database query time | How much latency is DB |
| Apdex score | 0-1 score weighted by "satisfaction" |
| Conversion rate | Business outcome correlated with perf |
SLI / SLO / SLA
| Term | Meaning | Example |
|---|---|---|
| SLI (Indicator) | The metric itself | p95 latency |
| SLO (Objective) | Internal target for SLI | p95 < 500ms over 30d |
| SLA (Agreement) | Customer-facing contract | 99.9% uptime; refund if violated |
| Error budget | How much you can fail and still hit SLO | 43m/month at 99.9% |
How to measure API performance
Synthetic / load testing
Tools: JMeter, k6, Locust, Gatling. Run before deploys + scheduled. Reproducible baseline.
Real User Monitoring (RUM)
Capture metrics from actual user sessions in production. Tools: Datadog, New Relic, Sentry.
APM (Application Performance Monitoring)
Server-side instrumentation. Tools: Datadog APM, New Relic APM, Dynatrace, OpenTelemetry.
Logs + metrics + traces
OpenTelemetry standard. Logs (events), metrics (numbers), traces (request flow).
API performance best practices
- Measure, don't guess. Instrumentation first.
- Track percentiles. p95, p99 — not averages.
- Define SLOs. Concrete targets keep teams honest.
- Alert on burn rate. Early warning before SLO blown.
- Test at higher load than expected. 2-3× peak.
- Monitor saturation, not just latency. Saturation predicts; latency reports.
- Tag by endpoint + version. Aggregate hides issues.
- Slice by region, browser, device. Issues often concentrated.
- Continuous load testing in CI. Catch regressions early.
Common pitfalls
- Reporting averages. Hides outliers; p99 is what kills users.
- Only measuring in staging. Production load differs.
- No SLO discipline. Metrics ignored without targets.
- Alerting on all 5xx. Noise; alert on rate change instead.
- Single-tool reliance. APM + load testing + RUM each see different angles.
- Performance "tested" once. Re-test after every major change.
- Ignoring tail latency. p99.9 matters for users with bad luck.
FAQ: API performance metrics
What's a good API latency?
Depends. Web APIs: p95 < 500ms is healthy. Real-time: < 100ms. Internal APIs can be tighter. Compare to user expectation + competitors.
How do I find my max throughput?
Load test increasing RPS until latency degrades or errors spike. The "knee" is your max.
What's an acceptable error rate?
Most SLOs: < 0.1% 5xx. Higher acceptable for some non-critical endpoints. Critical paths (checkout, login) should be < 0.01%.
p95 vs p99: which to track?
Both. p95 = quality for typical users; p99 = quality for unlucky users. Most SLOs target both.
How is throughput related to capacity?
Capacity is max sustainable throughput before degradation. Find via load testing.
What's an error budget?
The amount of unreliability allowed by an SLO. 99.9% = 43m/month error budget. Burn through fast = freeze releases.
How often should I load test?
Continuously in CI for critical changes. Periodic full load tests (monthly/quarterly).
Measure API performance with LoadFocus
LoadFocus runs JMeter and k6 scripts from 25+ regions, capturing latency percentiles, throughput, and error rates. Sign up free at loadfocus.com/signup.
Related LoadFocus Tools
Put this concept into practice with LoadFocus — the same platform that powers everything you just read about.