Monitoring
PgCache exposes Prometheus-compatible metrics via an HTTP endpoint, giving you visibility into query performance, cache behavior, CDC replication health, and connection state.
Enabling Metrics
The PgCache Docker image enables metrics on port 9090 by default. The listen address is set via the [metrics] section (or --metrics_socket) — see the Configuration reference. Publish the port to access them:
docker run -d -p 5432:5432 -p 9090:9090 pgcache/pgcache \
--upstream postgres://user:password@db:5432/myappSecurity: the metrics endpoint is unauthenticated and exposes operational detail about your traffic. Restrict the metrics port to trusted/internal networks (a private subnet, security group, or firewall rule) — do not publish it to the public internet.
To use a different port, set the METRICS_PORT environment variable:
docker run -d -p 5432:5432 -p 8080:8080 \
-e METRICS_PORT=8080 \
-e UPSTREAM_URL=postgres://user:password@db:5432/myapp \
pgcache/pgcacheWhen running pgcache outside Docker, add a [metrics] section to your TOML configuration:
[metrics]
socket = "0.0.0.0:9090"Or use the CLI argument:
pgcache --config pgcache.toml --metrics_socket 0.0.0.0:9090The HTTP server exposes several endpoints:
| Endpoint | Description |
|---|---|
GET /metrics | Prometheus metrics in text exposition format |
GET /healthz | Liveness check — always returns 200 OK if the process is running |
GET /readyz | Readiness check — returns 200 OK when the cache is running, 503 otherwise |
GET /status | JSON object with full cache, CDC, and per-query status (see Status Endpoint below) |
GET /config | Current effective configuration, with restart_required flag if static fields on disk differ from running values |
PUT /config | Partial configuration update — writes changes to the TOML file (preserving comments and formatting) and reloads dynamic fields in place |
POST /config/reload | Re-read the TOML file and apply any dynamic field changes without restart |
Histograms report p50, p95, and p99 quantiles.
Prometheus Scrape Configuration
scrape_configs:
- job_name: pgcache
static_configs:
- targets: ['pgcache-host:9090']Available Metrics
Query Counters
Track how queries flow through PgCache.
| Metric | Type | Description |
|---|---|---|
pgcache.queries.total | counter | Total queries received |
pgcache.queries.cacheable | counter | Queries identified as cacheable |
pgcache.queries.uncacheable | counter | Queries forwarded to origin (not cacheable) |
pgcache.queries.unsupported | counter | Unsupported statement types |
pgcache.queries.invalid | counter | Queries that failed to parse |
pgcache.queries.cache_hit | counter | Queries served from cache |
pgcache.queries.cache_miss | counter | Cacheable queries that missed the cache |
pgcache.queries.cache_error | counter | Cache lookup errors (query forwarded to origin) |
pgcache.queries.allowlist_skipped | counter | Queries skipped because their tables are not in the allowlist |
Latency Histograms
All latency metrics are in seconds and report p50, p95, and p99 quantiles.
| Metric | Description |
|---|---|
pgcache.query.latency_seconds | End-to-end query latency |
pgcache.cache.lookup_latency_seconds | Cache lookup time |
pgcache.origin.latency_seconds | Origin database query time |
pgcache.query.registration_latency_seconds | Time to register a new query in the cache |
Per-Stage Timing
Detailed breakdown of where time is spent within PgCache:
| Metric | Description |
|---|---|
pgcache.query.stage.parse_seconds | SQL parsing |
pgcache.query.stage.dispatch_seconds | Dispatching to cache channel |
pgcache.query.stage.lookup_seconds | Cache lookup |
pgcache.query.stage.queue_wait_seconds | Time waiting in worker channel queue |
pgcache.query.stage.conn_wait_seconds | Time waiting for a cache database connection |
pgcache.query.stage.spawn_wait_seconds | Time waiting for worker task spawn |
pgcache.query.stage.worker_exec_seconds | Cache worker execution |
pgcache.query.stage.response_write_seconds | Writing response to client |
pgcache.query.stage.forward_decision_seconds | Cache-miss path: dispatch to forward decision |
pgcache.query.stage.coalesce_intake_seconds | Coalesce path: enqueue a waiter |
pgcache.query.stage.coalesce_wait_seconds | Coalesce path: wait for in-flight population |
pgcache.query.stage.total_seconds | Total pipeline time |
Connection Metrics
| Metric | Type | Description |
|---|---|---|
pgcache.connections.total | counter | Total connections accepted |
pgcache.connections.active | gauge | Currently active connections |
pgcache.connections.errors | counter | Connection errors |
CDC / Replication Metrics
Monitor the health and throughput of the CDC replication stream.
| Metric | Type | Description |
|---|---|---|
pgcache.cdc.events_processed | counter | Total CDC events processed |
pgcache.cdc.inserts | counter | Insert events received |
pgcache.cdc.updates | counter | Update events received |
pgcache.cdc.deletes | counter | Delete events received |
pgcache.cdc.lag_bytes | gauge | WAL replication lag in bytes |
pgcache.cdc.lag_seconds | gauge | Replication lag in seconds |
pgcache.cdc.received_lsn | gauge | Last LSN received from origin via XLogData |
pgcache.cdc.flushed_lsn | gauge | Last LSN acknowledged to origin via standby status update |
pgcache.cdc.applied_lsn | gauge | Highest LSN whose effects have been fully applied by the writer (transaction-aligned) |
CDC Connection Resilience
If the replication connection drops, pgcache automatically switches to forwarding all queries to the origin database while attempting to reconnect. The /readyz endpoint continues to return 200 during this period since the proxy is still serving queries (via origin). Once reconnected and the replication slot LSN is verified, cache dispatch resumes automatically.
Monitor pgcache.cdc.lag_bytes and pgcache.cdc.lag_seconds for replication health. A sudden spike in origin latency (pgcache.origin.latency_seconds) alongside cache hit ratio dropping to zero may indicate the CDC connection is temporarily down and queries are being forwarded.
Cache State Metrics
| Metric | Type | Description |
|---|---|---|
pgcache.cache.invalidations | counter | Cache entries invalidated by CDC events |
pgcache.cache.evictions | counter | Cache entries evicted due to size limits |
pgcache.cache.queries_registered | gauge | Number of queries currently cached |
pgcache.cache.queries_loading | gauge | Queries currently being loaded into cache |
pgcache.cache.queries_pending | gauge | Queries seen but not yet admitted to cache |
pgcache.cache.queries_invalidated | gauge | Invalidated entries retained for fast readmission |
pgcache.cache.readmissions | counter | Queries fast-readmitted after CDC invalidation |
pgcache.cache.mv_fallthrough | counter | Requests that fell through from a materialized-view result to source-row evaluation |
pgcache.cache.subsumptions | counter | Queries served via predicate subsumption (data already covered by another cached query) |
pgcache.cache.subsumption_latency_seconds | histogram | Time spent detecting predicate subsumption |
pgcache.cache.size_bytes | gauge | Current cache size in bytes |
pgcache.cache.size_limit_bytes | gauge | Configured cache size limit |
pgcache.cache.tables_tracked | gauge | Number of tables tracked for cache invalidation |
pgcache.cache.restarts_total | counter | Successful cache-subsystem restarts performed by the supervisor after a backend failure |
pgcache.cache.pool_replenished | counter | Poisoned cache-database serve-pool connections discarded and replaced |
pgcache.cache.pool_recycled | counter | Serve-pool connections recycled to reclaim accumulated Postgres plan-cache memory |
Memory Pressure
pgcache bounds its total memory footprint: as whole-system used memory (pgcache plus the cache Postgres it manages) approaches the registration budget (80% of detected RAM by default, or memory_limit — see Configuration), registration of new distinct queries is throttled and those queries are forwarded to origin instead of cached. Already-cached queries keep serving.
| Metric | Type | Description |
|---|---|---|
pgcache.cache.memory_used_bytes | gauge | Whole-system used memory (pgcache + cache Postgres) — the figure compared against the budget |
pgcache.cache.rss_bytes | gauge | Resident set size of the pgcache process alone (its share of the above) |
pgcache.cache.memory_budget_bytes | gauge | Used-memory high-water mark above which registration is throttled |
pgcache.cache.query_count_cap | gauge | Max registered queries that fit the memory budget (0 = uncapped) |
pgcache.cache.marginal_bytes_per_query | gauge | Measured per-query memory footprint, used to derive the count cap |
pgcache.cache.registration_throttled | gauge | 1 while registration is throttled by memory pressure, else 0 |
pgcache.cache.registration_throttled_total | counter | Queries forwarded to origin (not registered) due to memory-pressure throttling |
In-Process Result Memo
The in-process result memo is an in-memory tier that serves the hottest queries inline, skipping the cache-database round-trip. Its byte budget is set by memo_cache_size (default 64 MiB; 0 disables) — see the Configuration reference.
| Metric | Type | Description |
|---|---|---|
pgcache.cache.memo_hits | counter | Cache hits served inline from the in-process memo |
pgcache.cache.memo_captures | counter | Result snapshots stored into the memo |
pgcache.cache.memo_evictions | counter | Memo entries dropped (CDC-evicted or budget-reclaimed) |
pgcache.cache.memo_entries | gauge | Current number of live memo entries |
pgcache.cache.memo_bytes | gauge | Current total bytes held by the memo |
Writer Queue Depths
| Metric | Type | Description |
|---|---|---|
pgcache.cache.writer_query_queue | gauge | Pending query registration messages |
pgcache.cache.writer_cdc_queue | gauge | Pending CDC messages |
pgcache.cache.writer_internal_queue | gauge | Internal message queue depth |
CDC Handler Metrics
| Metric | Type | Description |
|---|---|---|
pgcache.cache.handle_inserts | counter | Insert operations processed by cache writer |
pgcache.cache.handle_updates | counter | Update operations processed by cache writer |
pgcache.cache.handle_deletes | counter | Delete operations processed by cache writer |
pgcache.cache.handle_insert_seconds | histogram | Insert handler duration |
pgcache.cache.handle_update_seconds | histogram | Update handler duration |
pgcache.cache.handle_delete_seconds | histogram | Delete handler duration |
pgcache.cache.cdc_prepared_hits | counter | Prepared-statement cache hits for per-query CDC evaluation |
pgcache.cache.cdc_prepared_misses | counter | Prepared-statement cache misses for CDC evaluation |
Writer Instrumentation
| Metric | Type | Description |
|---|---|---|
pgcache.cache.writer.command_handle_seconds | histogram | Per-command writer handler latency, labeled by cmd |
pgcache.cache.writer.register.resolve_seconds | histogram | query_register phase: resolve |
pgcache.cache.writer.register.subsumption_check_seconds | histogram | query_register phase: subsumption check |
pgcache.cache.writer.register.subsume_seconds | histogram | query_register phase: subsume |
pgcache.cache.writer.register.insert_seconds | histogram | query_register phase: insert |
pgcache.cache.writer.register.publication_update_seconds | histogram | query_register phase: publication update |
pgcache.cache.writer.register.populate_dispatch_seconds | histogram | query_register phase: dispatch population |
pgcache.cache.writer.resolve.update_queries_register_seconds | histogram | query_resolve phase: register update queries |
pgcache.cache.writer.resolve.deparse_seconds | histogram | query_resolve phase: deparse |
pgcache.cache.writer.update_queries_total | gauge | Total update queries across all relations |
pgcache.cache.writer.update_queries_max_per_relation | gauge | Largest update-query count on any single relation |
Population Pipeline
| Metric | Type | Description |
|---|---|---|
pgcache.cache.population.task_seconds | histogram | Per-task population duration |
pgcache.cache.population.stream_seconds | histogram | Time spent streaming rows from origin |
pgcache.cache.population.wait_seconds | histogram | Time waiting on the population channel |
pgcache.cache.population.worker_idle_seconds | histogram | Per-worker idle time between tasks |
Protocol Metrics
| Metric | Type | Description |
|---|---|---|
pgcache.protocol.simple_queries | counter | Queries using the simple query protocol |
pgcache.protocol.extended_queries | counter | Queries using the extended query protocol (Parse/Bind/Execute) |
pgcache.protocol.prepared_statements | counter | Prepared statements created |
pgcache.protocol.describe_cache.hits | counter | Synthesized Parse/Describe responses served from the describe cache |
pgcache.protocol.describe_cache.misses | counter | Describe-cache lookups that required building a new entry |
pgcache.protocol.describe_cache.evictions | counter | Describe-cache entries evicted |
pgcache.protocol.describe_cache.invalidations | counter | Describe-cache entries invalidated |
pgcache.protocol.lazy_parse_forwarded | counter | Parse messages sent to the origin lazily on the forward path |
pgcache.protocol.close_local | counter | Close(statement) messages handled locally (statement never prepared on origin) instead of being forwarded |
Key PromQL Queries
Cache Hit Ratio
rate(pgcache_queries_cache_hit_total[5m])
/
(rate(pgcache_queries_cache_hit_total[5m]) + rate(pgcache_queries_cache_miss_total[5m]))Query Latency (p95)
pgcache_query_latency_seconds{quantile="0.95"}CDC Replication Lag
pgcache_cdc_lag_secondsCache Size Utilization
pgcache_cache_size_bytes / pgcache_cache_size_limit_bytesStatus Endpoint
The GET /status endpoint returns a JSON object with real-time cache, CDC, and per-query information. Status data is gathered on demand from the cache writer via message passing (2-second timeout). If the cache thread is unresponsive, the endpoint returns 503.
Response Structure
{
"cache": {
"size_bytes": 10485760,
"size_limit_bytes": 1073741824,
"generation": 42,
"tables_tracked": 5,
"policy": "clock",
"queries_registered": 12,
"uptime_ms": 3600000,
"cache_hits": 15230,
"cache_misses": 487
},
"cdc": {
"tables": ["public.users", "public.orders"],
"last_applied_lsn": 12345600
},
"queries": [
{
"fingerprint": 9876543210,
"sql_preview": "SELECT * FROM users WHERE ...",
"tables": ["public.users"],
"state": "cached",
"cached_bytes": 2048,
"max_limit": null,
"pinned": false,
"hit_count": 1523,
"miss_count": 12,
"idle_duration_ms": 245,
"registered_duration_ms": 3580000,
"cached_duration_ms": 3500000,
"invalidation_count": 3,
"readmission_count": 2,
"eviction_count": 1,
"subsumption_count": 45,
"population_count": 4,
"last_population_duration_ms": 120,
"total_bytes_served": 4915200,
"population_row_count": 500,
"cache_hit_latency": {
"count": 1523,
"mean_us": 245.3,
"p50_us": 210,
"p95_us": 480,
"p99_us": 920,
"min_us": 85,
"max_us": 3200
}
}
]
}Cache Status Fields
| Field | Description |
|---|---|
queries_registered | Number of queries currently registered in the cache |
uptime_ms | Proxy uptime in milliseconds |
cache_hits | Total cache hits across all queries |
cache_misses | Total cache misses across all queries |
Per-Query Metrics
Each entry in the queries array includes operational metrics:
| Field | Description |
|---|---|
hit_count | Number of times this query was served from cache |
miss_count | Number of times this query missed the cache |
idle_duration_ms | Milliseconds since the last cache hit (null if no hits yet) |
registered_duration_ms | Milliseconds since the query was first seen |
cached_duration_ms | Milliseconds since the query was last populated (null if not currently cached) |
invalidation_count | Number of times invalidated by CDC events |
readmission_count | Number of times readmitted after invalidation |
eviction_count | Number of times evicted from cache |
subsumption_count | Number of times served via predicate subsumption |
population_count | Number of times the cache was populated for this query |
last_population_duration_ms | Duration of the last population in milliseconds |
total_bytes_served | Cumulative bytes served from cache for this query |
population_row_count | Number of rows inserted during the last population |
cache_hit_latency | Latency histogram for cache hits (null if no hits yet) — includes count, mean_us, p50_us, p95_us, p99_us, min_us, max_us |
Health Checks
Use /healthz and /readyz as Kubernetes or load balancer health probes:
# Kubernetes example
livenessProbe:
httpGet:
path: /healthz
port: 9090
readinessProbe:
httpGet:
path: /readyz
port: 9090Log Configuration
In addition to metrics, PgCache supports configurable logging via the log_level setting. This uses tracing’s EnvFilter syntax:
# Show info and above for all modules
log_level = "info"
# Debug logging for the cache subsystem, info for everything else
log_level = "pgcache_lib::cache=debug,info"
# Trace-level logging (very verbose)
log_level = "trace"You can also set the RUST_LOG environment variable for one-off debugging sessions.