Tailscale identity instead of DB passwords (and a per-user bandwidth cap, while we're here)

Most teams running a production Postgres cluster end up at roughly the same place: every engineer has their own database user, those users are created by hand or by Terraform, and rotation and offboarding are things humans remember to do. It works. Postgres is designed for this. The catch is operational — the database doesn’t know anything about your org chart, so every change at the people level has to be mirrored into the database out of band, and the credential itself is one more thing the engineer has to hold and the database has to verify.

The standard ways to skip the parallel identity step are all real: IAM database authentication on managed Postgres, a sidecar token broker, mTLS with a SPIFFE-ish identity document, a full service mesh. They all assume you’re willing to stand up an identity plane next to your apps. If the only thing between you and a better story is “we’d have to operate one more thing,” the per-engineer-user model with manual rotation tends to win on inertia.

The thing Redo had that most teams running on a managed database don’t is that every engineer was already on a Tailscale tailnet, and every database was already behind it. The identity question was already answered, by something that knew about people in a way Postgres never will. The only question was whether to keep maintaining a parallel set of PG users and a tailnet, or put something small on the path that turned one into the other.

Once that small thing exists, the next observation is free. The database has no idea who’s running SELECT * FROM events without a LIMIT and burning the egress budget, because to the database it’s just “the engineer user.” But the proxy does. The proxy can count.

That proxy is in production now. It’s called Waypoint, it sits on our tailnet in front of CockroachDB and MongoDB Atlas, and it does the two things that fall out of having identity in the path:

It refuses to take a password. It identifies the caller from their Tailscale identity, looks up an ACL capability grant, and mints a scoped per-session database role on demand.
It counts every byte. Per user, across every replica, in a sliding window backed by Redis. When a user crosses their bandwidth tier, the connection drops mid-query.

The auth half is what motivated building anything. The bandwidth half is what becomes interesting once you have a proxy that already knows who’s on the other end of every connection. This post is mostly about both, and the parts of the implementation I think are worth stealing.

Waypoint is MIT-licensed on GitHub and the docs are up. It’s been in production at Redo since early May 2026; this post is written against v0.6.7.

TL;DR

A Tailscale-aware database proxy that (a) turns Tailscale identity into a per-connection backend role and (b) coordinates per-user bandwidth across N replicas via a Lua sliding-window in Redis. Two replicas in front of one CockroachDB cluster and two MongoDB clusters. No passwords on the client. Every byte accounted for. Open source.

A note on Border0

If you don’t need per-user bandwidth limiting, take a look at Border0 before you read any further. Tailscale acquired them and ships Border0 as part of the platform now; it puts Tailscale identity in front of a longer list of databases than Waypoint touches, and the audit-log UI is the one we explicitly chose not to build. It’s good.

Waypoint mostly exists because we’d already built it by the time Border0 became a real option. The reason we haven’t migrated since is that Border0 doesn’t do per-user bandwidth limiting, and that’s the half we lean on hardest.

Why the auth and bandwidth halves belong in the same binary

The auth story and the bandwidth story are the same observation at two levels. Both want the same thing in the middle of every connection: an identifier you can trust, and a place to put state about that identifier that every proxy replica can see.

Once you’ve built that scaffolding for one, the other is mostly free. If you build it twice — once in pg_hba.conf and again in a sidecar rate limiter — you’ve built two different identity models that drift, and you get to debug both of them at 3am.

This is also approximately the argument for service meshes. We don’t run one. The mesh-y answer to the auth half is mTLS with a SPIFFE-ish identity document; the mesh-y answer to the bandwidth half is an Envoy filter pinned to that identity. We didn’t want to operate a mesh just for this, and Tailscale was already in the path.

The auth half: WhoIs instead of pg_hba

The Postgres protocol expects an authentication handshake. PG’s wire protocol opens with a startup message announcing a username and database, the server sends back an AuthenticationRequest, and the client answers with a SCRAM exchange or, in clients that haven’t been updated since the 2010s, an MD5 hash. None of this is interesting and all of it is something Waypoint has to handle.

In practice this means Waypoint links against github.com/jackc/pgx/v5/pgproto3 and re-implements the server side of the handshake, then turns around and performs the client side against the real backend with credentials it just minted. The reference SCRAM flow is RFC 5802 §3; the bits that matter for a proxy live in internal/pgwire/auth.go.

What’s actually useful is what happens before any of that. The client’s TCP connection comes in over a Tailscale subnet, which means Waypoint can ask tsnet’s local API who the caller is:

who, err := lc.WhoIs(ctx, remoteAddr)
if err != nil { /* ... */ }

rules, err := tailcfg.UnmarshalCapJSON[CapRule](
    who.CapMap, "redo.com/cap/waypoint",
)

WhoIs returns the caller’s Tailscale login name, node name, tags, and crucially their CapMap — the capability grants the tailnet’s ACL policy hands them. We define a single capability key, redo.com/cap/waypoint, and hang everything off it.

Here’s the actual grant our engineer group gets in production, copied from our Pulumi config:

{
  "src": ["group:engineer"],
  "dst": ["tag:waypoint"],
  "app": {
    "redo.com/cap/waypoint": [{
      "backends": {
        "pg-production": {
          "pg": { "databases": { "redo": { "permissions": ["readonly"] } } },
          "limits": { "bandwidth": [{ "bytes": 1073741824, "period": "1h" }] }
        },
        "mongo-production": {
          "mongo": { "databases": { "*": { "permissions": ["readonly"] } } },
          "limits": { "bandwidth": [{ "bytes": 1073741824, "period": "1h" }] }
        }
      }
    }]
  }
}

1 GiB per hour, readonly, against the two backends we actually want engineers reaching. The engineeringProductionAccess group gets the same shape with readwrite and the same 1 GiB budget. Both grants live in a single TypeScript file in our infra repo, and the Tailscale ACL is the single source of truth.

The schema is the moral equivalent of an iam:Policy for databases. Tailscale capability grants are still under-documented compared to ACL rules; the format is whatever shape you want on the value side, as long as you can parse it back out with tailcfg.UnmarshalCapJSON. That flexibility is most of why this fits at all.

Minting a scoped role without minting a hot path

The naive version of this is: every connection gets a fresh PG role with the right GRANTs, then a DROP ROLE when the connection closes. This is catastrophic. CREATE ROLE on CockroachDB is a cluster-wide write that serializes with other DDL, and applying the full preset GRANTs (a couple of GRANT ON ALL TABLES IN SCHEMA public ... statements) on every connection means a fresh database connection becomes a ~second-long operation under load.

This is a Cockroach quirk; vanilla Postgres lets you CREATE ROLE more cheaply but still serializes object-level GRANT statements behind AccessShareLock. Either way, you don’t want it in your hot path.

What Waypoint does instead is separate the user role from the privilege role:

For every preset–schema–database tuple, Waypoint creates one shared group role named like wp_grp_readonly_public_redo, lazily, the first time someone wants it. The group holds the actual GRANTs.
Per-connection, Waypoint creates a thin per-user LOGIN role (wp_alice_macbook_redo) and GRANTs it membership in the relevant group. Group membership is cheap. Object-level GRANTs are not.

The group role’s bootstrap is idempotent and gated by a Redis key:

const groupReadyTTL = 24 * time.Hour
// ...
if ok, _ := p.store.IsGroupReady(ctx, name); ok {
    return name, nil // skip the GRANTs entirely
}

A cache miss is harmless — re-applying the GRANTs is a no-op on both PG and Cockroach — but the Redis check makes the steady state one GET instead of a handful of catalog writes.

There’s a second path for grants that include arbitrary SQL fragments (REVOKE, ALTER DEFAULT PRIVILEGES, etc., which only operators can write into the ACL). Those get a content-addressed composite group named wp_grp_perms_<sha256-prefix>_<database>. Same idea — every user wanting that exact permission set joins the same group — but the hash means a one- character edit to the SQL produces a brand new group, which is what you want. The code is in internal/provision/groups.go if you’re curious.

Locking, because CockroachDB

Two replicas of Waypoint can race to mint the same user role at the same instant. CockroachDB doesn’t ship a reliable CREATE ROLE IF NOT EXISTS (the syntax varies by version, the semantics by phase of moon), and the obvious “just retry on uniqueness violation” loop ends up wedged behind a contended DDL queue.

So Waypoint takes a 30-second SET NX EX lock in Redis keyed by the role name, with a unique token. The release script is the standard atomic check-and-delete:

if redis.call("GET", KEYS[1]) == ARGV[1] then
    return redis.call("DEL", KEYS[1])
end
return 0

If the lock can’t be acquired within 10 × 100ms of retries, the connection gets a clean FATAL 53300 rejection rather than queueing forever. We measure this; in production, contention is near zero because almost every connection hits an already-provisioned user and skips the lock path entirely.

MongoDB on Atlas: static users, not dynamic ones

The PG path mints a fresh LOGIN role per connection. Atlas does not let you do that. There’s no client-facing db.createUser() in Atlas’s hosted mode, and the Atlas Admin API is its own thing that lives behind a public API key with its own rate limits. Trying to provision users on every connection would mean either storing an Atlas API key in the proxy (no) or pre-minting users out of band and having the proxy pick one (fine).

We picked one. There are exactly two static users on each MongoDB cluster: atlas_readonly and atlas_readwrite, created by our Atlas IaC. Waypoint authenticates the client by Tailscale identity, looks at their grant, and originates a SCRAM-SHA-256 handshake to the backend as the appropriate static user. The client never sees the static user’s password and never sends one of their own.

On paper this is a downgrade from the per-connection roles on PG, but in practice the thing we actually care about — which human did that thing — is recovered from the proxy’s logs, not the database’s. The database’s logs only ever see two usernames; Waypoint’s logs see the Tailscale identity. Bandwidth accounting, connection accounting, and the ACL gate all run on the Tailscale identity. The Atlas user is a backend implementation detail.

This is also why we don’t bother provisioning Atlas users at a finer grain than read/write. Atlas would happily let us create one user per role-set or one per database; every extra shape is another thing that can drift from the ACL, and the ACL is already the source of truth for the granularity we care about.

The bandwidth half: a sliding window inside Redis

This is the part I find more interesting. Per-replica rate limiting is easy and broken. Coordinating a budget across N replicas of a proxy without sending every byte’s worth of metadata to a coordinator is the actual problem.

The broken way is to take a 1 GiB/hour limit and divide by 2 because you have 2 replicas. Then a user who happens to land all their connections on replica A gets cut off at 512 MiB while replica B sits idle. You can also take a token bucket per replica and gossip, but you’ve now invented half of Envoy. The other broken way is to put Redis on the path of every byte — every relayed packet calls INCRBY before it goes upstream — and your throughput is bottlenecked on Redis round trips instead of the database.

What we do instead is one I/O-bounded thing: every connection accumulates pending bytes in-process, and every 10 seconds (or on close) flushes the delta to Redis via a Lua script that both increments the user’s current window and returns the new total. If the new total exceeds the user’s tier, the next call into the relay loop sees ErrBandwidthLimitExceeded and tears the connection down.

The script

The script is a sliding window over a Redis Hash, broken into sub-buckets:

-- KEYS[1] = bw:3600:{alice}/pg-production
-- ARGV   = bytes, current_bucket, min_bucket, ttl
local bytes_to_add = tonumber(ARGV[1])
redis.call('HINCRBY', KEYS[1], ARGV[2], bytes_to_add)
redis.call('EXPIRE',  KEYS[1], tonumber(ARGV[4]))

local all = redis.call('HGETALL', KEYS[1])
local total, expired = 0, {}
for i = 1, #all, 2 do
    local bucket_id = tonumber(all[i])
    if bucket_id >= tonumber(ARGV[3]) then
        total = total + tonumber(all[i + 1])
    else
        expired[#expired + 1] = all[i]
    end
end
if #expired > 0 then redis.call('HDEL', KEYS[1], unpack(expired)) end
return total

Each tier gets about 360 of them. For the 1-hour tier that’s a 10-second bucket; for a daily tier it’d be a 4-minute one. The point is to keep the window smooth — at the edges of an hour you don’t suddenly forgive an hour of traffic the way a flat counter with EXPIRE 3600 would. Memory is cheap here: a few hundred small hash fields per (user, tier), and at 24 bytes per field that’s a few KB of Redis RAM per user regardless of throughput. We watch the total across all users in redis_memory_used_bytes and it sits well under a single ElastiCache node’s tier.

We picked 360 because it’s the smallest number where I stopped being able to perceive the staircase in monitoring graphs. There is no science here. If you want the math: with N sub-buckets, the worst-case error is one bucket’s worth, which is 1/N of the limit, which at N=360 is ~0.28%.

Buckets inside the window sum to current usage. HDEL evicts the trailing one each flush.

That KEYS[1] is the leaf — per-user, per-listener — but the full operation walks two keys at once: bw:3600:{alice}/pg-production and bw:3600:{alice}. Every flush updates both, which lets a grant with a global bandwidth cap and a tighter per-backend cap enforce both levels in the same Redis round-trip. The leaf usually bites first, because per-backend tiers are how operators express “do not exfiltrate the events table.” The global cap is the backstop for the “summed across every backend I touched” case.

The Lua snippet above is the single-key version, for readability. The real one (hierarchicalSlidingWindowScript in internal/restrict/redis.go) takes KEYS[1..N] from leaf to root and does the same work at each level, returning only the leaf total because the leaf is always the most restrictive.

One flush touches every level. The leaf usually bites first; the root is the backstop across backends.

The {alice} curly braces in those keys aren’t decorative. Redis Cluster routes a key to a slot based on whatever’s inside the braces, so every key for Alice lands on the same slot and we can run a multi-key Lua script against them without a CROSSSLOT error. We don’t run Cluster yet, but I’d rather not revisit every key in the codebase when we do.

Nothing in any of this flushes per byte. Each connection holds a single atomic.Int64 counter in process; the hot path is one atomic add per packet, and a tickered goroutine drains the counter to Redis every ten seconds. That makes us maybe ten seconds late on enforcement at the tail, which is fine for a 1-hour budget. For a 1-second tier we’d flush faster, but we don’t have any of those.

Connection counts, same shape

The bandwidth scaffolding picks up per-user connection counts almost for free. The grant schema allows the same limits block to specify a maximum-concurrent number alongside the bandwidth tier, and every new accept runs a short Lua script against conn:{alice}/pg-production (and conn:{alice} for the global cap). The script increments at every level, checks against the configured limits, and returns a decision code the proxy can act on without a follow-up round-trip:

local leaf = tonumber(redis.call('GET', KEYS[1]) or '0')
local root = tonumber(redis.call('GET', KEYS[#KEYS]) or '0')

if endpoint_max > 0 and leaf >= endpoint_max then return {1, leaf, root} end
if global_max   > 0 and root >= global_max   then return {2, leaf, root} end

for i = 1, #KEYS do redis.call('INCRBY', KEYS[i], 1) end
return {0, leaf + 1, root + 1}

The decrement is a deferred call on connection close. The same hash-tag trick keeps everything in one Redis slot when we eventually run Cluster.

The connection-count path is more sensitive to Redis latency than the bandwidth path, because it lives on every accept, not on a 10s flush. We measured it — at our connection rate, the budget is a sub-millisecond Redis round-trip per accept, and we run the proxy in the same AZ as Redis so this stays cheap. If it ever stops being cheap, the same in-process accumulator pattern fits: keep a tentative count locally, flush at coarser intervals, eat a small amount of over-shoot at the boundary.

What we watch

The proxy exports Prometheus metrics scoped by listener, user, and tier: waypoint_bytes_total{user="alice", listener="pg-production"}, waypoint_bandwidth_window_bytes{...}, waypoint_connections_active{...}, and a counter for each decision code the limit scripts return. The dashboard we actually look at when something’s wrong has three panels:

Top users by bandwidth in the current window. When egress on CockroachDB Cloud spikes, this tells us within seconds who’s responsible. In the worst case it’s a coworker; in the better case it’s a misconfigured Looker connection holding open a result set.
Decision-code rate over time. A growing rate of endpoint limit rejections means a user is regularly hitting their cap and someone should either talk to them or bump their tier. A growing rate of global limit rejections means the whole-user cap is too tight. A spike of lock-acquisition rejections (the auth path) means PG role provisioning is contended and we should look at why.
Flush age. The 10-second flush is the only thing between in-process counters and the cross-replica truth. If the flush age starts climbing, Redis is slow and the limit is about to drift. We’ve alerted on this once, when our managed Redis vendor took maintenance during business hours.

We don’t graph “Redis ops/sec.” We graph the proxy’s view of Redis. A Redis hiccup that isn’t visible to the proxy isn’t a problem; one that is, is.

What it took to get MongoDB working

The Postgres path was the easy half. MongoDB took most of the calendar time, and the parts that took the longest had nothing to do with auth. The auth story we already covered above; everything else lives at the wire-protocol layer, and that’s what made this hard.

The wire protocol itself is fine. It’s a length-prefixed envelope (OP_MSG in the modern protocol; OP_QUERY and OP_REPLY in the legacy handshake) carrying BSON bodies. The official Go driver ships an encoder/decoder that’s usable as a library. The protocol is not what makes this hard.

What makes this hard is what the protocol carries. Two things specifically.

Topology rewriting in `hello` responses

A MongoDB client doesn’t trust the address you handed it. The first thing it does after TLS is send a hello command (or the legacy isMaster for older drivers), and the server responds with the entire replica-set topology: who’s primary, who are the secondaries, what their network addresses are, what the set is called.

The driver then opens direct connections to those advertised addresses. A transparent proxy that doesn’t rewrite the topology gets bypassed the moment the client receives its first hello response, because the client has just been handed a list of Atlas hostnames it can route to directly.

Waypoint speaks enough of the MongoDB wire protocol to find the hosts array in hello responses and rewrite each entry to point at the proxy. That sounds like one line of code. It is not, because the topology can come back through several different command shapes (hello, isMaster, replSetGetStatus), and because the client also caches the topology between connections and uses internal inconsistencies as a signal that the server is unhealthy. If you rewrite hosts but forget to also rewrite primary and me, the driver eventually decides this server is lying and marks it down.

The state machine for the handshake lives in internal/mongowire/handshake.go. The rewrite itself is in rewrite.go. Both are heavily tested against the official Go driver, because the driver is aggressive about validating responses and finds bugs nothing else does.

If you ever have to write a partial wire-protocol implementation for a database, do it against the official driver as your conformance suite from day one. Driver behavior is the spec in the places where the actual spec is silent, and the silent places are where the bugs live.

Threading SNI through the rewrite

When Waypoint rewrites a hosts entry, the new value has to be a hostname the client can resolve and connect to, and that hostname has to round-trip back to the right backend on the next connection. We do that with per-backend subdomains: a backend named mongo-production is reached at mongo-production.waypoint.redo.run; the rewrite substitutes that string in; the next connection’s SNI tells Waypoint which backend the client thinks it’s talking to.

The SNI lookup happens in the same crypto/tls GetCertificate callback that handles *.ts.net cert provisioning for tailnet names. That callback is the single point where backend routing for both Postgres and MongoDB fans out from — having one routing point meant the MongoDB rewrite story was a configuration thing rather than a code thing, and it’s the single biggest reason both backends are in the same binary instead of two sibling proxies.

Wire-protocol surface area

For anyone considering this for another database: the surface area of the MongoDB wire protocol you have to implement to make a proxy work is bounded. We handle:

Envelope parsing: OP_MSG (sections 0 and 1), enough of OP_QUERY and OP_REPLY to get a legacy driver through the handshake.
Command introspection: hello, isMaster, saslStart, saslContinue, replSetGetStatus. Every other command is opaque BSON that we pass through.
SCRAM-SHA-256: server side toward the client, client side toward the backend. Both are short.
Topology rewriting: hosts, passives, arbiters, primary, me, setName. Everything else in the topology document is preserved.

That’s the whole list. We don’t parse query bodies, we don’t look at writes, we don’t implement aggregation. The proxy is a wire-protocol router with an auth boundary and a byte counter, not a Mongo-knowing database tool. Keeping it that small is the single most important constraint — the moment Waypoint starts understanding queries, it owns the database’s correctness story, and that’s a different kind of project.

What runs in production

Two replicas of waypoint-proxy on EKS, behind a Tailscale Service (svc:waypoint-db) so every replica advertises the same name and clients don’t care which one they land on. CPU request 100m, memory 128Mi; in practice the proxy idles. Everything Tailscale-y is a tsnet node tagged tag:waypoint, authed via Kubernetes Workload Identity Federation into Tailscale, so there are no static auth keys on disk.

This part is its own small post: Tailscale’s FederatedIdentity resource lets you trade a projected K8s service account token for a fresh tsnet auth key without ever materializing a long-lived secret. The token comes in via a projected volume scoped to a 1-hour audience.

The Postgres listener fronts our CockroachDB Cloud cluster:

[[listeners]]
name = "pg-production"
listen = ":5432"
mode = "postgres"
backend = "internal-redoproduction-cockroach-tmg.aws-us-east-1.cockroachlabs.cloud:26257"
tls = true
tls_mode = "optional"
use_tailscale_tls = true
service = "svc:waypoint-db"

[listeners.postgres]
admin_user = "${PG_ADMIN_USER}"
admin_password = "${PG_ADMIN_PASSWORD}"
admin_database = "redo"
user_prefix = "wp_"
user_ttl = "24h"

tls_mode = "optional" because we want psql and DataGrip to both work without surprises. When a client does send a PG SSLRequest, Waypoint terminates TLS with the right cert based on SNI — a *.ts.net cert from Tailscale’s HTTPS provisioning for tailnet names, or a file-based cert for the custom waypoint.redo.run name.

The custom-name story is in the Postgres TLS docs. The short version: the SNI lookup happens in the same crypto/tls GetCertificate callback that handles *.ts.net lookups, so you can route the same listener to multiple hostnames without code changes.

The MongoDB listener has the same shape with mode = "mongo", a backend pointing at the Atlas SRV record, and a mongo admin block naming the two static users described above. The deployment story is identical: same pod, same Tailscale tag, same Redis. Adding the second backend was almost entirely a listeners block in config and a switch in the routing callback.

The trade-offs we made

The brand rules at Redo say every “weird trick” should explain what it costs. A handful of explicit ones:

Redis is now a tier-zero dependency. If Redis is down, Waypoint refuses new connections. We picked our managed Redis (Valkey, actually) precisely because we trust it more than the database we’re protecting. If you’re running Redis on the same nodes as your app, do not do this.

10-second flush latency means burst-over. A user with a 1 GiB/hour budget who opens a fat pipe can pull a sliver past their limit before the next flush lands and the connection dies. For a 1-hour tier we don’t care. For a 1-second tier you’d need a tighter flush, and at that point you’re paying Redis round-trips per byte and you’ve lost most of the win.

One extra hop. Engineers’ queries now pay an extra ~5ms for the proxy plus the WhoIs lookup (which tsnet caches locally). Backend connections are reused, so it’s a startup cost, not a per-statement cost. Worth it.

The proxy sees everything. Bytes are bytes; the proxy does not parse SQL or inspect Mongo documents and we don’t intend to. That keeps Waypoint a byte counter and an auth boundary, not a DLP system, and it keeps the latency story honest.

This is also the reason the proxy gets to be ~~boring~~ small: under 10k lines of Go for the core, no SQL parser, no query rewriter, no policy engine beyond the JSON grant. If we ever want a query-shape rate limit, it goes in a separate process that reads from the same Redis.

Atlas auth is static, by necessity. The MongoDB backend can’t do per-connection users, and we’ve decided that’s fine because the identity that matters lives on the Tailscale side. If your audit story requires the database’s own logs to attribute every query to a human, this trade doesn’t work for you, and you’d want either a different backend or a log-shipping shim that joins proxy logs to backend logs.

What we explicitly didn’t build

A query log. Logs of who ran what go via CockroachDB and Atlas, not the proxy. We do export bytes.read / bytes.written by user, which is what we actually look at when something’s wrong.
A management UI. There’s a waypoint-monitor TUI you SSH into over Tailscale (ssh waypoint-monitor), which is good enough.
Anything that does NAT for non-Tailscale clients. Tailscale is the identity story; if you’re not on the tailnet you don’t exist.
HTTP-level routing or virtual hosts. Waypoint is L4/wire-protocol-shaped on purpose.

Where Waypoint goes next

The thing about a proxy you trust is that you start wanting it to do more things. A handful of things on the roadmap I think are worth flagging:

A third backend. Kafka is the obvious next step: SASL/SCRAM is already in the toolkit from the Mongo work, there’s no topology to rewrite, and the per-connection cost is fixed. Snowflake or Redshift would be the more interesting test, because both speak protocols our backend abstraction doesn’t quite fit yet. We’ll do whichever one our team needs first.

Per-query-shape limits. The brand of rate limiting we have now is byte-level. It can’t tell the difference between a Looker dashboard fetching 1 GiB of summary rows and one engineer running SELECT * FROM events and canceling after 1 GiB. A side-channel process that subscribes to a query-fingerprint stream from the proxy and rate-limits by query family would catch the second case before the first byte goes out. Anything that touches the hot path is a hard no; anything we can do asynchronously is on the table.

Audit-log integration. Bandwidth metrics tell us who used how much. They don’t tell us what was run. For now, the answer is “join the proxy’s per-connection summary to Cockroach’s query log by application_name,” which works but is awkward. We’ve talked about exposing a “tail” mode where Waypoint streams per-connection summaries (rows out, byte counts, query family fingerprint) into a write-ahead log without parsing semantics. That would let us reconstruct the picture without making the proxy understand queries.

Self-service tier requests. Today a tier change is a Pulumi PR to the ACL. That’s the right shape for the long tail, but for “I’m running a backfill, can I have 5 GiB for the next hour” it’s annoying. A Slack workflow that bumps a tier for a fixed window with manager approval would be a thin wrapper over the Tailscale ACL API.

Per-PR ephemeral grants. Tag a CI runner’s tsnet node with tag:pr-12345 and give the ACL a rule that grants that node a scoped capability for as long as the PR is open. When the PR closes, the node tears down and the grant vanishes with it. The primitives all exist; the missing piece is the Tailscale-side automation that tags ephemeral CI nodes consistently.

Lifting the Redis dependency for small deployments. Redis is load-bearing for both the locking story and the bandwidth flush. For a single-replica deployment, you don’t need Redis at all; an in-process store would do. We’d take a clean PR for this. It’s been on the issue tracker for a while.

What I’d change if I started over

I’d reach for the hierarchical Redis keys earlier. The first cut had a flat per-user counter and a flat per-(user, listener) counter, and the endpoint check happened in Go after two GETs. Collapsing both into one Lua script with KEYS[1..N] was a ~150-line patch that made all of the limit math feel obvious — and it was the only place where the second backend’s shape forced the first backend’s abstraction to get cleaner.

I’d also budget the MongoDB driver-conformance work as a multiple of the protocol work, not a fraction. The handshake state machine went through three revisions before the official Go driver stopped finding edge cases. The protocol parsing was a week; the “make the real driver happy” loop was the rest of the month. If you’re considering this for another protocol with a strict reference driver, plan for the same shape.

If you want to read the actual code, it’s on GitHub. The two files I’d start in are internal/restrict/redis.go (the Lua scripts) and internal/provision/groups.go (the group-role bootstrap). The MongoDB handshake is in internal/mongowire/, but only go there if you want to be sad. The rest is mostly wire-protocol bookkeeping and tsnet setup.

I’ll write a follow-up when one of these trade-offs bites us in production. That hasn’t happened yet, but it’s a question of when.