What is API Caching?

Storing API responses temporarily so subsequent identical requests return instantly without re-running the underlying work. Reduces latency and load.

What is API caching?

API caching is the practice of storing responses from an API so that subsequent identical requests can be served from the cache instead of re-running the underlying work. The cached response is usually fresh enough — for many endpoints, data changes far less often than it's read — and serving from cache is dramatically faster than recomputing from databases or downstream services. Caching is one of the highest-leverage optimizations in API engineering: a well-placed cache can cut p95 latency by 10-100x and reduce backend load by 90%+.

Cache locations stack: browser → CDN → API gateway → application memory → distributed cache (Redis/Memcached) → database query cache. Each layer cuts a fraction of requests, with progressively higher hit rates and lower cost as you move toward the user. Effective API caching strategy chooses the right layer (or layers) for each endpoint based on access patterns and freshness requirements.

The five major API caching layers

1. Client / browser cache

The browser stores responses based on HTTP cache-control headers. Zero round trip when cached. Best for content that's safe to cache locally per-user (static assets, public reference data). Controlled by Cache-Control: max-age=3600 and ETags.

2. CDN edge cache (Cloudflare, CloudFront, Fastly)

Cached at edge POPs near users. ~10-50ms response time globally. Best for content cached for many users (public APIs, anonymized data, marketing endpoints). Controlled by Cache-Control headers + CDN-specific cache rules.

3. API gateway cache (AWS API Gateway, Kong, Apigee)

Centralized cache in front of your APIs. Reduces backend hits without browser/CDN cooperation. Useful for shared logic before routing to services.

4. Application-level cache (in-process or distributed)

The most flexible: cache anything in memory (in-process) or in Redis/Memcached (shared across instances). Best for computed responses that aren't worth pushing to CDN. Common pattern: cache the expensive read query, not the full HTTP response.

5. Database query cache

Many databases cache query plans and frequent results internally. Less control but often automatic. PostgreSQL's shared_buffers, MySQL's query cache (deprecated post-8.0), Redis as a database cache layer.

HTTP caching headers (the standard)

RFC 9111 defines the cache semantics every CDN, browser, and reverse proxy follows:

  • Cache-Control: public, max-age=3600 — cacheable by anyone for 1 hour.
  • Cache-Control: private, max-age=300 — only the user's browser can cache (not CDN), 5 minutes.
  • Cache-Control: no-store — never cache (sensitive data).
  • Cache-Control: no-cache — must revalidate before reusing (cached but conditional).
  • Cache-Control: stale-while-revalidate=60 — serve stale up to 60s extra while fetching fresh in background. Great UX win.
  • ETag: "abc123" — content fingerprint. Client sends If-None-Match: "abc123"; server responds 304 if unchanged.
  • Vary: Accept-Encoding, Authorization — cache key includes these headers (different content per encoding/user).

Common caching strategies

  • Cache-aside (lazy loading). App checks cache; on miss, fetches from source and populates cache. Simple, common.
  • Read-through. Cache layer transparently fetches on miss. Less app code; requires cache to know the source.
  • Write-through. Writes go to cache + source simultaneously. Cache is always consistent but slower writes.
  • Write-back. Writes go to cache; source updated async. Fast writes but risk of loss on cache failure.
  • Refresh-ahead. Cache proactively refreshes before TTL expires. Smooth latency, requires good prediction.

Cache invalidation: the hard problem

Phil Karlton's quote is famous: "There are only two hard things in computer science: cache invalidation and naming things." Strategies:

  • TTL-based. Set Cache-Control max-age. Simple. Tradeoff: stale data for up to TTL after change.
  • Manual purge. Call CDN/cache API to invalidate specific keys when data changes. Precise; requires writes to know which keys.
  • Versioned URLs. Append version (/api/products?v=42) so changes get fresh URLs. Old cache entries die naturally.
  • Event-driven invalidation. Pub/sub broadcasts "product 42 changed"; cache layers listen and purge. Most accurate; most complex.
  • Surrogate keys (Cache tags). Tag responses with logical keys ("user:42", "products") and purge by tag. Fastly, Cloudflare Enterprise support this.

Common API caching mistakes

  • Caching authenticated responses publicly. A logged-in user's data ends up in a shared cache; another user sees it. Always set Cache-Control: private for personalized responses.
  • Forgetting Vary headers. Cache returns gzipped response to a client requesting identity encoding. Always include Vary: Accept-Encoding.
  • TTL too long. Hours of stale data after every update. Tune TTL to your actual change frequency.
  • TTL too short. Cache hit rate <50% means you're not really caching. Aim for 90%+ hit rate on cached endpoints.
  • Cache stampede. When TTL expires, 1,000 concurrent requests all miss and hit the database. Mitigate with locks ("only one fetch on miss"), refresh-ahead, or stale-while-revalidate.
  • Caching errors. A 500 error gets cached for 5 minutes; users keep seeing the broken response. Always set Cache-Control: no-store on error responses.
  • No observability. Without metrics on hit rate, miss rate, and origin load, you can't tune. Track these per endpoint.

FAQ: API Caching

How do I know what to cache?

Profile your API: which endpoints are slowest? Most-requested? Have stable data? Those are caching candidates. Tools like Datadog APM show endpoint latency and request rate side by side.

What's a good cache hit rate?

For well-cached endpoints, aim for 90%+ hit rate at the CDN. Lower means TTLs are too short or you're caching things that shouldn't be (per-user data). Below 50% means caching is barely helping.

Can I cache POST requests?

Generally no — HTTP semantics treat POSTs as state-changing. Some APIs cache idempotent POSTs (e.g., search) at the application layer with explicit cache keys, but the CDN won't do this for you.

How does API caching interact with rate limiting?

Cache hits typically bypass rate limits (they don't reach the rate-limited endpoint). This protects backends but means abusive users can hammer a cached endpoint freely. For sensitive endpoints, rate-limit at the CDN edge.

What about GraphQL caching?

Trickier than REST. The query is in the POST body, so URL-based caching fails. Solutions: persisted queries (turn complex queries into IDs, cacheable as GET), Apollo's automatic persisted queries, query-level cache hints.

How do I test API caching?

Functionally: send identical requests, verify the second is faster (X-Cache: HIT header, lower latency). Under load: load test with cache enabled vs. disabled to measure backend protection.

What's the difference between caching and CDN?

A CDN is a delivery network with caches at edge POPs. "Caching" is the broader practice of storing responses at any layer. CDN caching is one specific case.

How LoadFocus relates to API caching strategy

API caching's value only shows under realistic concurrent traffic. LoadFocus API load testing measures backend hit rate (cache miss = traffic hitting your origin) under realistic concurrency, surfacing cache stampedes and ineffective TTLs. API monitoring tracks cache hit rate over time so you catch regressions when a deploy accidentally bypasses the cache.

How fast is your website?

Elevate its speed and SEO seamlessly with our Free Speed Test.

Free Website Speed Test

Analyze your website's load speed and improve its performance with our free page speed checker.

×