Tailscale identity instead of DB passwords (and a per-user bandwidth cap, while we're here)
Most teams running a production Postgres cluster end up at roughly the same place: every engineer has their own database user, those users are created by hand or by Terraform, and rotation and offboarding are things humans remember to do. It works. Postgres is designed for this. The catch is operational — the database doesn’t know anything about your org chart, so every change at the people level has to be mirrored into the database out of band, and the credential itself is one more thing the engineer has to hold and the database has to verify.
The standard ways to skip the parallel identity step are all real: IAM database authentication on managed Postgres, a sidecar token broker, mTLS with a SPIFFE-ish identity document, a full service mesh. They all assume you’re willing to stand up an identity plane next to your apps. If the only thing between you and a better story is “we’d have to operate one more thing,” the per-engineer-user model with manual rotation tends to win on inertia.
The thing Redo had that most teams running on a managed database don’t is that every engineer was already on a Tailscale tailnet, and every database was already behind it. The identity question was already answered, by something that knew about people in a way Postgres never will. The only question was whether to keep maintaining a parallel set of PG users and a tailnet, or put something small on the path that turned one into the other.
Once that small thing exists, the next observation is free. The database
has no idea who’s running SELECT * FROM events without a LIMIT and
burning the egress budget, because to the database it’s just “the
engineer user.” But the proxy does. The proxy can count.
That proxy is in production now. It’s called Waypoint, it sits on our tailnet in front of CockroachDB and MongoDB Atlas, and it does the two things that fall out of having identity in the path:
- It refuses to take a password. It identifies the caller from their Tailscale identity, looks up an ACL capability grant, and mints a scoped per-session database role on demand.
- It counts every byte. Per user, across every replica, in a sliding window backed by Redis. When a user crosses their bandwidth tier, the connection drops mid-query.
The auth half is what motivated building anything. The bandwidth half is what becomes interesting once you have a proxy that already knows who’s on the other end of every connection. This post is mostly about both, and the parts of the implementation I think are worth stealing.
Waypoint is MIT-licensed on GitHub and the docs are up. It’s been in production at Redo since early May 2026; this post is written against v0.6.7.
TL;DR
A Tailscale-aware database proxy that (a) turns Tailscale identity into a per-connection backend role and (b) coordinates per-user bandwidth across N replicas via a Lua sliding-window in Redis. Two replicas in front of one CockroachDB cluster and two MongoDB clusters. No passwords on the client. Every byte accounted for. Open source.
A note on Border0
If you don’t need per-user bandwidth limiting, take a look at Border0 before you read any further. Tailscale acquired them and ships Border0 as part of the platform now; it puts Tailscale identity in front of a longer list of databases than Waypoint touches, and the audit-log UI is the one we explicitly chose not to build. It’s good.
Waypoint mostly exists because we’d already built it by the time Border0 became a real option. The reason we haven’t migrated since is that Border0 doesn’t do per-user bandwidth limiting, and that’s the half we lean on hardest.
Why the auth and bandwidth halves belong in the same binary
The auth story and the bandwidth story are the same observation at two levels. Both want the same thing in the middle of every connection: an identifier you can trust, and a place to put state about that identifier that every proxy replica can see.
Once you’ve built that scaffolding for one, the other is mostly free. If
you build it twice — once in pg_hba.conf and again in a sidecar rate
limiter — you’ve built two different identity models that drift, and you
get to debug both of them at 3am.
This is also approximately the argument for service meshes. We don’t run one. The mesh-y answer to the auth half is mTLS with a SPIFFE-ish identity document; the mesh-y answer to the bandwidth half is an Envoy filter pinned to that identity. We didn’t want to operate a mesh just for this, and Tailscale was already in the path.
The auth half: WhoIs instead of pg_hba
The Postgres protocol expects an authentication handshake. PG’s wire protocol
opens with a startup message announcing a username and database, the server
sends back an AuthenticationRequest, and the client answers with a SCRAM
exchange or, in clients that haven’t been updated since the 2010s, an MD5
hash. None of this is interesting and all of it is something Waypoint has to
handle.
In practice this means Waypoint links against github.com/jackc/pgx/v5/pgproto3
and re-implements the server side of the handshake, then turns around and
performs the client side against the real backend with credentials it just
minted. The reference SCRAM flow is RFC 5802 §3; the bits that matter for a
proxy live in internal/pgwire/auth.go.
What’s actually useful is what happens before any of that. The client’s TCP
connection comes in over a Tailscale subnet, which means Waypoint can ask
tsnet’s local API who the caller is:
who, err := lc.WhoIs(ctx, remoteAddr)
if err != nil { /* ... */ }
rules, err := tailcfg.UnmarshalCapJSON[CapRule](
who.CapMap, "redo.com/cap/waypoint",
)
WhoIs returns the caller’s Tailscale login name, node name, tags, and
crucially their CapMap — the capability grants the tailnet’s ACL policy
hands them. We define a single capability key, redo.com/cap/waypoint, and
hang everything off it.
Here’s the actual grant our engineer group gets in production, copied from
our Pulumi config:
{
"src": ["group:engineer"],
"dst": ["tag:waypoint"],
"app": {
"redo.com/cap/waypoint": [{
"backends": {
"pg-production": {
"pg": { "databases": { "redo": { "permissions": ["readonly"] } } },
"limits": { "bandwidth": [{ "bytes": 1073741824, "period": "1h" }] }
},
"mongo-production": {
"mongo": { "databases": { "*": { "permissions": ["readonly"] } } },
"limits": { "bandwidth": [{ "bytes": 1073741824, "period": "1h" }] }
}
}
}]
}
}
1 GiB per hour, readonly, against the two backends we actually want
engineers reaching. The engineeringProductionAccess group gets the same
shape with readwrite and the same 1 GiB budget. Both grants live in a
single TypeScript file in our infra repo, and the Tailscale ACL is the
single source of truth.
The schema is the moral equivalent of an iam:Policy for databases.
Tailscale capability grants are still under-documented compared to ACL
rules; the format is whatever shape you want on the value side, as long as
you can parse it back out with
tailcfg.UnmarshalCapJSON.
That flexibility is most of why this fits at all.
Minting a scoped role without minting a hot path
The naive version of this is: every connection gets a fresh PG role with the
right GRANTs, then a DROP ROLE when the connection closes. This is
catastrophic. CREATE ROLE on CockroachDB is a cluster-wide write that
serializes with other DDL, and applying the full
preset GRANTs (a couple of GRANT ON ALL TABLES IN SCHEMA public ...
statements) on every connection means a fresh database connection becomes a
~second-long operation under load.
This is a Cockroach quirk; vanilla Postgres lets you CREATE ROLE more
cheaply but still serializes object-level GRANT statements behind
AccessShareLock. Either way, you don’t want it in your hot path.
What Waypoint does instead is separate the user role from the privilege role:
- For every preset–schema–database tuple, Waypoint creates one shared group
role named like
wp_grp_readonly_public_redo, lazily, the first time someone wants it. The group holds the actualGRANTs. - Per-connection, Waypoint creates a thin per-user
LOGINrole (wp_alice_macbook_redo) andGRANTs it membership in the relevant group. Group membership is cheap. Object-level GRANTs are not.
The group role’s bootstrap is idempotent and gated by a Redis key:
const groupReadyTTL = 24 * time.Hour
// ...
if ok, _ := p.store.IsGroupReady(ctx, name); ok {
return name, nil // skip the GRANTs entirely
}
A cache miss is harmless — re-applying the GRANTs is a no-op on both PG and
Cockroach — but the Redis check makes the steady state one GET instead of a
handful of catalog writes.
There’s a second path for grants that include arbitrary SQL fragments
(REVOKE, ALTER DEFAULT PRIVILEGES, etc., which only operators can write
into the ACL). Those get a content-addressed composite group named
wp_grp_perms_<sha256-prefix>_<database>. Same idea — every user wanting
that exact permission set joins the same group — but the hash means a one-
character edit to the SQL produces a brand new group, which is what you
want. The code is in internal/provision/groups.go if you’re curious.
Locking, because CockroachDB
Two replicas of Waypoint can race to mint the same user role at the same
instant. CockroachDB doesn’t ship a reliable CREATE ROLE IF NOT EXISTS (the
syntax varies by version, the semantics by phase of moon), and the obvious
“just retry on uniqueness violation” loop ends up wedged behind a contended
DDL queue.
So Waypoint takes a 30-second SET NX EX lock in Redis keyed by the role
name, with a unique token. The release script is the standard atomic
check-and-delete:
if redis.call("GET", KEYS[1]) == ARGV[1] then
return redis.call("DEL", KEYS[1])
end
return 0
If the lock can’t be acquired within 10 × 100ms of retries, the connection
gets a clean FATAL 53300 rejection rather than queueing forever. We measure
this; in production, contention is near zero because almost every connection
hits an already-provisioned user and skips the lock path entirely.
MongoDB on Atlas: static users, not dynamic ones
The PG path mints a fresh LOGIN role per connection. Atlas does not let
you do that. There’s no client-facing db.createUser() in Atlas’s hosted
mode, and the Atlas Admin API is its own thing that lives behind a public
API key with its own rate limits. Trying to provision users on every
connection would mean either storing an Atlas API key in the proxy (no) or
pre-minting users out of band and having the proxy pick one (fine).
We picked one. There are exactly two static users on each MongoDB cluster:
atlas_readonly and atlas_readwrite, created by our Atlas IaC. Waypoint
authenticates the client by Tailscale identity, looks at their grant, and
originates a SCRAM-SHA-256 handshake to the backend as the appropriate
static user. The client never sees the static user’s password and never
sends one of their own.
On paper this is a downgrade from the per-connection roles on PG, but in practice the thing we actually care about — which human did that thing — is recovered from the proxy’s logs, not the database’s. The database’s logs only ever see two usernames; Waypoint’s logs see the Tailscale identity. Bandwidth accounting, connection accounting, and the ACL gate all run on the Tailscale identity. The Atlas user is a backend implementation detail.
This is also why we don’t bother provisioning Atlas users at a finer grain than read/write. Atlas would happily let us create one user per role-set or one per database; every extra shape is another thing that can drift from the ACL, and the ACL is already the source of truth for the granularity we care about.
The bandwidth half: a sliding window inside Redis
This is the part I find more interesting. Per-replica rate limiting is easy and broken. Coordinating a budget across N replicas of a proxy without sending every byte’s worth of metadata to a coordinator is the actual problem.
The broken way is to take a 1 GiB/hour limit and divide by 2 because you
have 2 replicas. Then a user who happens to land all their connections on
replica A gets cut off at 512 MiB while replica B sits idle. You can also
take a token bucket per replica and gossip, but you’ve now invented half of
Envoy. The other broken way is to put Redis on the path of every byte —
every relayed packet calls INCRBY before it goes upstream — and your
throughput is bottlenecked on Redis round trips instead of the database.
What we do instead is one I/O-bounded thing: every connection accumulates
pending bytes in-process, and every 10 seconds (or on close) flushes the
delta to Redis via a Lua script that both increments the user’s current
window and returns the new total. If the new total exceeds the user’s tier,
the next call into the relay loop sees ErrBandwidthLimitExceeded and tears
the connection down.
The script
The script is a sliding window over a Redis Hash, broken into sub-buckets:
-- KEYS[1] = bw:3600:{alice}/pg-production
-- ARGV = bytes, current_bucket, min_bucket, ttl
local bytes_to_add = tonumber(ARGV[1])
redis.call('HINCRBY', KEYS[1], ARGV[2], bytes_to_add)
redis.call('EXPIRE', KEYS[1], tonumber(ARGV[4]))
local all = redis.call('HGETALL', KEYS[1])
local total, expired = 0, {}
for i = 1, #all, 2 do
local bucket_id = tonumber(all[i])
if bucket_id >= tonumber(ARGV[3]) then
total = total + tonumber(all[i + 1])
else
expired[#expired + 1] = all[i]
end
end
if #expired > 0 then redis.call('HDEL', KEYS[1], unpack(expired)) end
return total
Each tier gets about 360 of them. For the
1-hour tier that’s a 10-second bucket; for a daily tier it’d be a
4-minute one. The point is to keep the
window smooth — at the edges of an hour you don’t suddenly forgive an
hour of traffic the way a flat counter with EXPIRE 3600 would. Memory
is cheap here: a few hundred small hash fields per (user, tier), and at
24 bytes per field that’s a few KB of Redis RAM per user regardless of
throughput. We watch the total across all users in
redis_memory_used_bytes and it sits well under a single ElastiCache
node’s tier.
We picked 360 because it’s the smallest number where I stopped being able to perceive the staircase in monitoring graphs. There is no science here. If you want the math: with N sub-buckets, the worst-case error is one bucket’s worth, which is 1/N of the limit, which at N=360 is ~0.28%.
That KEYS[1] is the leaf — per-user, per-listener — but the full
operation walks two keys at once: bw:3600:{alice}/pg-production and
bw:3600:{alice}. Every flush updates both, which lets a grant with a
global bandwidth cap and a tighter per-backend cap enforce both levels
in the same Redis round-trip. The leaf usually
bites first, because per-backend tiers are how operators express “do not
exfiltrate the events table.” The global cap is the backstop for the
“summed across every backend I touched” case.
The Lua snippet above is the single-key version, for readability. The real
one (hierarchicalSlidingWindowScript in
internal/restrict/redis.go) takes KEYS[1..N] from leaf to root and
does the same work at each level, returning only the leaf total because
the leaf is always the most restrictive.
The {alice} curly braces in those keys aren’t decorative. Redis Cluster
routes a key to a slot based on whatever’s inside the braces, so every
key for Alice lands on the same slot and we can run a multi-key Lua
script against them without a CROSSSLOT error. We don’t run Cluster
yet, but I’d rather not revisit every key in the codebase when we do.
Nothing in any of this flushes per byte. Each connection holds a single
atomic.Int64 counter in process; the hot path is one atomic add per
packet, and a tickered goroutine drains the counter to Redis every ten
seconds. That makes us maybe ten seconds late on enforcement at the
tail, which is fine for a 1-hour budget. For a 1-second tier we’d flush
faster, but we don’t have any of those.
Connection counts, same shape
The bandwidth scaffolding picks up per-user connection counts almost for
free. The grant schema allows the same limits block to specify a
maximum-concurrent number alongside the bandwidth tier, and every new
accept runs a short Lua script against conn:{alice}/pg-production (and
conn:{alice} for the global cap). The script increments at every level,
checks against the configured limits, and returns a decision code the proxy
can act on without a follow-up round-trip:
local leaf = tonumber(redis.call('GET', KEYS[1]) or '0')
local root = tonumber(redis.call('GET', KEYS[#KEYS]) or '0')
if endpoint_max > 0 and leaf >= endpoint_max then return {1, leaf, root} end
if global_max > 0 and root >= global_max then return {2, leaf, root} end
for i = 1, #KEYS do redis.call('INCRBY', KEYS[i], 1) end
return {0, leaf + 1, root + 1}
The decrement is a deferred call on connection close. The same hash-tag trick keeps everything in one Redis slot when we eventually run Cluster.
The connection-count path is more sensitive to Redis latency than the
bandwidth path, because it lives on every accept, not on a 10s flush.
We measured it — at our connection rate, the budget is a sub-millisecond
Redis round-trip per accept, and we run the proxy in the same AZ as Redis
so this stays cheap. If it ever stops being cheap, the same in-process
accumulator pattern fits: keep a tentative count locally, flush at coarser
intervals, eat a small amount of over-shoot at the boundary.
What we watch
The proxy exports Prometheus metrics scoped by listener, user, and tier:
waypoint_bytes_total{user="alice", listener="pg-production"},
waypoint_bandwidth_window_bytes{...}, waypoint_connections_active{...},
and a counter for each decision code the limit scripts return. The
dashboard we actually look at when something’s wrong has three panels:
- Top users by bandwidth in the current window. When egress on CockroachDB Cloud spikes, this tells us within seconds who’s responsible. In the worst case it’s a coworker; in the better case it’s a misconfigured Looker connection holding open a result set.
- Decision-code rate over time. A growing rate of
endpoint limitrejections means a user is regularly hitting their cap and someone should either talk to them or bump their tier. A growing rate ofglobal limitrejections means the whole-user cap is too tight. A spike of lock-acquisition rejections (the auth path) means PG role provisioning is contended and we should look at why. - Flush age. The 10-second flush is the only thing between in-process counters and the cross-replica truth. If the flush age starts climbing, Redis is slow and the limit is about to drift. We’ve alerted on this once, when our managed Redis vendor took maintenance during business hours.
We don’t graph “Redis ops/sec.” We graph the proxy’s view of Redis. A Redis hiccup that isn’t visible to the proxy isn’t a problem; one that is, is.
What it took to get MongoDB working
The Postgres path was the easy half. MongoDB took most of the calendar time, and the parts that took the longest had nothing to do with auth. The auth story we already covered above; everything else lives at the wire-protocol layer, and that’s what made this hard.
The wire protocol itself is fine. It’s a length-prefixed envelope
(OP_MSG in the modern protocol; OP_QUERY and OP_REPLY in the legacy
handshake) carrying BSON bodies. The official Go driver ships an
encoder/decoder that’s usable as a library. The protocol is not what
makes this hard.
What makes this hard is what the protocol carries. Two things specifically.
Topology rewriting in hello responses
A MongoDB client doesn’t trust the address you handed it. The first thing
it does after TLS is send a hello command (or the legacy isMaster
for older drivers), and the server responds with the entire replica-set
topology: who’s primary, who are the secondaries, what their network
addresses are, what the set is called.
The driver then opens direct connections to those advertised addresses.
A transparent proxy that doesn’t rewrite the topology gets bypassed the
moment the client receives its first hello response, because the client
has just been handed a list of Atlas hostnames it can route to directly.
Waypoint speaks enough of the MongoDB wire protocol to find the hosts
array in hello responses and rewrite each entry to point at the proxy.
That sounds like one line of code. It is not, because the topology can
come back through several different command shapes (hello, isMaster,
replSetGetStatus), and because the client also caches the topology
between connections and uses internal inconsistencies as a signal that
the server is unhealthy. If you rewrite hosts but forget to also rewrite
primary and me, the driver eventually decides this server is lying and
marks it down.
The state machine for the handshake lives in
internal/mongowire/handshake.go. The rewrite itself is in rewrite.go.
Both are heavily tested against the official Go driver, because the driver
is aggressive about validating responses and finds bugs nothing else
does.
If you ever have to write a partial wire-protocol implementation for a database, do it against the official driver as your conformance suite from day one. Driver behavior is the spec in the places where the actual spec is silent, and the silent places are where the bugs live.
Threading SNI through the rewrite
When Waypoint rewrites a hosts entry, the new value has to be a hostname
the client can resolve and connect to, and that hostname has to round-trip
back to the right backend on the next connection. We do that with
per-backend subdomains: a backend named mongo-production is reached at
mongo-production.waypoint.redo.run; the rewrite substitutes that string
in; the next connection’s SNI tells Waypoint which backend the client
thinks it’s talking to.
The SNI lookup happens in the same crypto/tls GetCertificate callback
that handles *.ts.net cert provisioning for tailnet names. That callback
is the single point where backend routing for both Postgres and MongoDB
fans out from — having one routing point meant the MongoDB rewrite story
was a configuration thing rather than a code thing, and it’s the
single biggest reason both backends are in the same binary instead of two
sibling proxies.
Wire-protocol surface area
For anyone considering this for another database: the surface area of the MongoDB wire protocol you have to implement to make a proxy work is bounded. We handle:
- Envelope parsing:
OP_MSG(sections 0 and 1), enough ofOP_QUERYandOP_REPLYto get a legacy driver through the handshake. - Command introspection:
hello,isMaster,saslStart,saslContinue,replSetGetStatus. Every other command is opaque BSON that we pass through. - SCRAM-SHA-256: server side toward the client, client side toward the backend. Both are short.
- Topology rewriting:
hosts,passives,arbiters,primary,me,setName. Everything else in the topology document is preserved.
That’s the whole list. We don’t parse query bodies, we don’t look at writes, we don’t implement aggregation. The proxy is a wire-protocol router with an auth boundary and a byte counter, not a Mongo-knowing database tool. Keeping it that small is the single most important constraint — the moment Waypoint starts understanding queries, it owns the database’s correctness story, and that’s a different kind of project.
What runs in production
Two replicas of waypoint-proxy on EKS, behind a Tailscale Service
(svc:waypoint-db) so every replica advertises the same name and clients
don’t care which one they land on. CPU request 100m, memory 128Mi; in
practice the proxy idles. Everything Tailscale-y is a tsnet node tagged
tag:waypoint, authed via Kubernetes Workload Identity Federation into
Tailscale, so there are no static auth keys on disk.
This part is its own small post: Tailscale’s
FederatedIdentity resource
lets you trade a projected K8s service account token for a fresh tsnet
auth key without ever materializing a long-lived secret. The token comes
in via a projected volume scoped to a 1-hour audience.
The Postgres listener fronts our CockroachDB Cloud cluster:
[[listeners]]
name = "pg-production"
listen = ":5432"
mode = "postgres"
backend = "internal-redoproduction-cockroach-tmg.aws-us-east-1.cockroachlabs.cloud:26257"
tls = true
tls_mode = "optional"
use_tailscale_tls = true
service = "svc:waypoint-db"
[listeners.postgres]
admin_user = "${PG_ADMIN_USER}"
admin_password = "${PG_ADMIN_PASSWORD}"
admin_database = "redo"
user_prefix = "wp_"
user_ttl = "24h"
tls_mode = "optional" because we want psql and DataGrip to both work
without surprises. When a client does send a PG SSLRequest, Waypoint
terminates TLS with the right cert based on SNI — a *.ts.net cert from
Tailscale’s HTTPS provisioning for tailnet names, or a file-based cert for
the custom waypoint.redo.run name.
The custom-name story is in the
Postgres TLS docs. The
short version: the SNI lookup happens in the same crypto/tls
GetCertificate callback that handles *.ts.net lookups, so you can
route the same listener to multiple hostnames without code changes.
The MongoDB listener has the same shape with mode = "mongo", a backend
pointing at the Atlas SRV record, and a mongo admin block naming the
two static users described above. The deployment story is identical: same
pod, same Tailscale tag, same Redis. Adding the second backend was almost
entirely a listeners block in config and a switch in the routing
callback.
The trade-offs we made
The brand rules at Redo say every “weird trick” should explain what it costs. A handful of explicit ones:
Redis is now a tier-zero dependency. If Redis is down, Waypoint refuses new connections. We picked our managed Redis (Valkey, actually) precisely because we trust it more than the database we’re protecting. If you’re running Redis on the same nodes as your app, do not do this.
10-second flush latency means burst-over. A user with a 1 GiB/hour budget who opens a fat pipe can pull a sliver past their limit before the next flush lands and the connection dies. For a 1-hour tier we don’t care. For a 1-second tier you’d need a tighter flush, and at that point you’re paying Redis round-trips per byte and you’ve lost most of the win.
One extra hop. Engineers’ queries now pay an extra ~5ms for the proxy
plus the WhoIs lookup (which tsnet caches locally). Backend connections
are reused, so it’s a startup cost, not a per-statement cost. Worth it.
The proxy sees everything. Bytes are bytes; the proxy does not parse SQL or inspect Mongo documents and we don’t intend to. That keeps Waypoint a byte counter and an auth boundary, not a DLP system, and it keeps the latency story honest.
This is also the reason the proxy gets to be boring small: under 10k
lines of Go for the core, no SQL parser, no query rewriter, no policy
engine beyond the JSON grant. If we ever want a query-shape rate limit, it
goes in a separate process that reads from the same Redis.
Atlas auth is static, by necessity. The MongoDB backend can’t do per-connection users, and we’ve decided that’s fine because the identity that matters lives on the Tailscale side. If your audit story requires the database’s own logs to attribute every query to a human, this trade doesn’t work for you, and you’d want either a different backend or a log-shipping shim that joins proxy logs to backend logs.
What we explicitly didn’t build
- A query log. Logs of who ran what go via CockroachDB and Atlas, not the
proxy. We do export
bytes.read/bytes.writtenby user, which is what we actually look at when something’s wrong. - A management UI. There’s a
waypoint-monitorTUI you SSH into over Tailscale (ssh waypoint-monitor), which is good enough. - Anything that does NAT for non-Tailscale clients. Tailscale is the identity story; if you’re not on the tailnet you don’t exist.
- HTTP-level routing or virtual hosts. Waypoint is L4/wire-protocol-shaped on purpose.
Where Waypoint goes next
The thing about a proxy you trust is that you start wanting it to do more things. A handful of things on the roadmap I think are worth flagging:
A third backend. Kafka is the obvious next step: SASL/SCRAM is already in the toolkit from the Mongo work, there’s no topology to rewrite, and the per-connection cost is fixed. Snowflake or Redshift would be the more interesting test, because both speak protocols our backend abstraction doesn’t quite fit yet. We’ll do whichever one our team needs first.
Per-query-shape limits. The brand of rate limiting we have now is
byte-level. It can’t tell the difference between a Looker dashboard
fetching 1 GiB of summary rows and one engineer running
SELECT * FROM events and canceling after 1 GiB. A side-channel process
that subscribes to a query-fingerprint stream from the proxy and
rate-limits by query family would catch the second case before the
first byte goes out. Anything that touches the hot path is a hard no;
anything we can do asynchronously is on the table.
Audit-log integration. Bandwidth metrics tell us who used how much.
They don’t tell us what was run. For now, the answer is “join the proxy’s
per-connection summary to Cockroach’s query log by application_name,”
which works but is awkward. We’ve talked about exposing a “tail” mode
where Waypoint streams per-connection summaries (rows out, byte counts,
query family fingerprint) into a write-ahead log without parsing
semantics. That would let us reconstruct the picture without making the
proxy understand queries.
Self-service tier requests. Today a tier change is a Pulumi PR to the ACL. That’s the right shape for the long tail, but for “I’m running a backfill, can I have 5 GiB for the next hour” it’s annoying. A Slack workflow that bumps a tier for a fixed window with manager approval would be a thin wrapper over the Tailscale ACL API.
Per-PR ephemeral grants. Tag a CI runner’s tsnet node with
tag:pr-12345 and give the ACL a rule that grants that node a scoped
capability for as long as the PR is open. When the PR closes, the node
tears down and the grant vanishes with it. The primitives all exist; the
missing piece is the Tailscale-side automation that tags ephemeral CI
nodes consistently.
Lifting the Redis dependency for small deployments. Redis is load-bearing for both the locking story and the bandwidth flush. For a single-replica deployment, you don’t need Redis at all; an in-process store would do. We’d take a clean PR for this. It’s been on the issue tracker for a while.
What I’d change if I started over
I’d reach for the hierarchical Redis keys earlier. The first cut had a
flat per-user counter and a flat per-(user, listener) counter, and the
endpoint check happened in Go after two GETs. Collapsing both into one
Lua script with KEYS[1..N] was a ~150-line patch that made all of the
limit math feel obvious — and it was the only place where the second
backend’s shape forced the first backend’s abstraction to get cleaner.
I’d also budget the MongoDB driver-conformance work as a multiple of the protocol work, not a fraction. The handshake state machine went through three revisions before the official Go driver stopped finding edge cases. The protocol parsing was a week; the “make the real driver happy” loop was the rest of the month. If you’re considering this for another protocol with a strict reference driver, plan for the same shape.
If you want to read the actual code, it’s
on GitHub. The two files I’d start
in are internal/restrict/redis.go (the Lua scripts) and
internal/provision/groups.go (the group-role bootstrap). The MongoDB
handshake is in internal/mongowire/, but only go there if you want to be
sad. The rest is mostly wire-protocol bookkeeping and tsnet setup.
I’ll write a follow-up when one of these trade-offs bites us in production. That hasn’t happened yet, but it’s a question of when.
Want to work on projects like this?
Apply now