RelayGate · programmable middleware for AI traffic · v1.x · production

Programmable middleware for AI traffic.

One binary between your agent and every model provider. Run scripts and R1 agents inline. CEL for routing. Signed receipts for everything.

RelayGate adds one capability the other AI gateways do not: your code runs inside the request. ContextWorkers mutate, enrich, block, or audit inline at sub-millisecond overhead. Route with CEL. Sign every receipt. Deploy one binary anywhere.

request flow · live
ContextWorker: ON
request POST /v1/chat/completions inbound: OpenAI · 2.1 KB
ContextWorker cw_pii_scrub · active pre-request · 0.14 ms · 2 redactions
backend anthropic-direct claude-sonnet-4 · streaming
worker ran
worker_idcw_pii_scrub_v2
modepre-request
latency0.14 ms
fields_modified[prompt]
redactions2
receipt
idrcpt_01HY...
ed25519signed
total overhead0.34 ms
backendanthropic-direct
status200
~238 µs overhead
·
CGO=0 binary
·
3 inbound formats
·
10 backend drivers
·
signed receipt per request
§ demo 01 / 04
ContextWorker · 01 / 04

Run a worker inside the request.

Drop a script or an R1 agent into the middle of a request. Mutate, enrich, block, audit. All inline. Scrub through one request's lifecycle and see what happens at each millisecond.

liverequest-lifecycle · /v1/chat/completions
scene 1 / 10
total duration: 1220 ms · worker overhead: 0.34 ms · signed events: 2
Request received t = 0.00s
state
event
contextworker
Illustrative. ContextWorker latencies vary with worker implementation. Benchmarks at /benchmarks.
§ demo 02 / 04
One canonical shape · 02 / 04

OpenAI, Anthropic, or Gemini in. ChatRequest out.

RelayGate translates every inbound format to a canonical ChatRequest before routing. Your code doesn't know or care which client protocol arrived. Switch providers without rewrites.

same logical request · three wire formats
inbound translator
inbound wire formatopenai
POST /v1/chat/completions
RelayGate inbound translator
0.03 ms
internal/inbound/openai.go
canonical outputstable
ChatRequest v1
{
  "model":    "<normalized>",
  "messages": "<canonical>",
  "tools":    []
}
output is identical across all three inbound formats
Your code doesn't know which format arrived. Three translators, one internal shape.
§ demo 03 / 04
How we differ · 03 / 04

When all backends are down, we tell you.

Most gateways queue your request internally and hope. RelayGate returns 503 plus Retry-After. Your SDK's retry logic takes over. You stay in control of timing, budget, and fallback.

simulation · all backends: circuit open
idle
idle — press trigger to simulate a full-provider outage
Typical gateway under outage
all backends circuit-open · queueing enabled
silent failures
RelayGate under outage
all backends circuit-open · honest backpressure
honest backpressure
Illustrative outage simulation. Underlying circuit-breaker benchmark: ~18.7 µs, zero allocations, from /benchmarks.
§ demo 04 / 04
One engine · 04 / 04

One expression language. Routing. Policy. Rate limits. Budgets.

CEL for routing decisions. CEL for policy gates. CEL for rate-limits and budget checks. Same engine, same syntax, one thing to learn. Type-checked expressions with predictable error messages.

config · /etc/relaygate/rules.cel
CEL v1
6 / 8 rules active routing · policy · rate-limit
Permitted action surface 6 active
All rules are CEL. One engine handles all four surfaces. Type-checked at config-load.
§ demo 05 / 08
Compose · 05 / 08

Chain ContextWorkers into a pipeline.

Pick from the gallery. Drop them into the request path in any order. The pipeline executes top-to-bottom, each worker can pass, mutate, enrich, block, or branch. Drag to reorder. The total overhead is the sum.

pipelinerelaygate.pipeline.compose
total: 0 workers · 0.00 ms
REQ · as the client sent itraw input
POST /v1/chat/completions
tenant: org_01HX
model:  claude-sonnet-4
tokens_max: 4000
prompt: "Review this PR for compliance. Contact the
         author at [email protected] or 555-12-3456 with
         issues. Reference: PR #4821."
pipeline · executes top → bottomdrop workers below
empty pipeline · request passes through at 0 ms
OUT · what the backend actually receivesafter pipeline
waiting for pipeline run…
auto-loop
worker action logidle
run a pipeline to see what each worker did to the request, step by step.
Workers execute in order. Mutations are passed to the next worker. Block aborts the chain and returns to caller. Every run emits a signed receipt listing every worker hit.
§ demo 06 / 08
In the suite · 06 / 08

Five flows through the Good Ventures Lab stack.

RelayGate composes with R1, TrueCom, RelayOne, DeepTap, Veritize, Actium, CloudSwarm, and Heroa. Step through a canonical scenario. Watch the request move across products in real time.

SaaS customer, fully managed path
LLM client → RelayOne → RelayGate → model. Grounding and verification inline.
step 0 / 0
trace · eventsidle
products in this flow
§ demo 07 / 08
In production · 07 / 08

Live traffic, live decisions. Watch the edge work.

Every request passes through CEL evaluation, ContextWorker execution, routing, and receipt signing. This is a simulated stream of ~8 rps. Click a request to see the full decision tree. Inject chaos to watch the system respond.

liverelaygate.console · tail -f /edge
● 0.0 rps
request stream0 / 0
decision tree
click a request in the stream to inspect its decision tree
inject:
§ demo 08 / 08
Shape · 08 / 08

The shape of one binary.

Reveal the stack. Every capability below ships inside the same ~18 MB statically-compiled binary. No sidecars, no C dependencies, no runtime per-feature install.

inbound translators
3 formats
OpenAI · Anthropic · Gemini
backend drivers
10 providers
OpenAI · Anthropic · Google · Groq · DeepSeek · Together · Mistral · Cohere · OpenRouter · +1 slot
CEL engine
4 surfaces
routing · policy · rate-limits · budgets
ContextWorker runtime
inline exec
scripts · R1 agents · sub-millisecond spawn · ~12 µs
circuit breakers
~18.7 µs
zero allocations · per-backend
receipt signer
Ed25519
TrueCom-compatible · < 0.3 ms
install paths
6 package managers
brew · deb · rpm · tar · Docker · Helm
binary
~18 MB
CGO=0 · no runtime deps · static
Every cell is a feature that ships inside the single binary. No per-feature install, no sidecar. Total router overhead per request: ~238 µs.
Measured, not marketing

What RelayGate actually costs per request.

Published benchmarks. Updated on each release. Reproducible from the repo.

Benchmarks update with each release. Reproduce from the /benchmarks page.
View full benchmarks
What's rare here

Five things that don't exist in the rest of the AI gateway market.

Not features. Specific architectural decisions that make RelayGate a different product, not just a faster one.

01 / 05

ContextWorker: programmable middleware, not dressed-up routing.

Every AI gateway offers routing, caching, and retries. RelayGate is the only one that lets you drop a script or a full R1 agent into the middle of a request, run it inline at sub-millisecond overhead, and shape the request or response before anything reaches your backend. A PII scrub, a code-review pass, an inline tool-call round-trip: all run as ContextWorkers, not as out-of-band webhook callbacks with multi-second latency.

02 / 05

Honest backpressure, by design.

When every backend is down, most gateways queue your request internally and silently consume your timeout budget while trying to recover. RelayGate immediately returns 503 with a Retry-After header. Your SDK's retry logic takes over. You stay in control of timing, fallback, and budget. Silent failure is the feature you did not choose; honest backpressure is the one you wanted.

03 / 05

CEL for everything.

Most AI gateways invent a routing DSL, then a different policy DSL, then a separate rate-limit grammar, then a budget mini-language. RelayGate uses Common Expression Language across all four surfaces. One engine. Type-checked at config-load time, not at request time. Predictable error messages. Your platform team learns one thing.

04 / 05

One binary, CGO=0, no runtime dependencies.

RelayGate is a single statically-compiled Go binary with no C dependencies. Under twenty megabytes. Deploys on bare metal, container, Helm, Docker, or a Raspberry Pi. No glibc incompatibilities, no libstdc++ version pins, no "it worked in dev." The operations story is: ship the binary, set some environment variables, run.

05 / 05

Signed receipts per request, TrueCom-compatible.

Every request produces an Ed25519-signed receipt suitable for audit, billing, or dispute resolution. The receipt format matches TrueCom's commerce substrate, so the same receipt can land in your finance pipeline without a second integration. Compliance and finance see the same signed event.

Each claim corresponds to a measurable capability or a demo above. Benchmarks at /benchmarks.
In the portfolio · 9 products

RelayGate composes with the rest of the Good Ventures Lab suite.

RelayGate stands alone. It also turns into something more when it runs inside the suite. Click a product to see how it plugs into the request path.

products in the suitehover to inspect
RelayGate is open source at the core. Self-host is free forever. Managed and fleet features are commercial. · Part of Good Ventures Lab.
Pricing

Self-host is free. Managed starts when you want it to.

Full pricing and feature matrix on /pricing. Quick preview below.

Self-host
for teams running their own infrastructure
free forever · Apache 2.0 core
  • Single binary, full feature set
  • ContextWorker, CEL, all 10 backends, all 3 inbound formats
  • Community support, GitHub issues
Download binary
Enterprise
for regulated or fleet-scale deployments
Custom SLA · sovereign region options
  • SSO, SCIM, fleet deployment via RelayOne
  • SLA, dedicated success engineer
  • On-prem connect, sovereign region options
Contact sales
Pricing is request-rate based. See full schedule on /pricing.

Install RelayGate.

One binary. Every major platform. No runtime dependencies.

brew install relayone/tap/relaygate
No runtime dependencies. CGO=0. Apache 2.0 core, MIT drivers, Managed tier commercial.