RelayGate · programmable middleware for AI traffic · v1.x · production

Programmable middleware for AI traffic.

One binary between your agent and every model provider. Run scripts and R1 agents inline. CEL for routing. Signed receipts for everything.

RelayGate adds one capability the other AI gateways do not: your code runs inside the request. ContextWorkers mutate, enrich, block, or audit inline at sub-millisecond overhead. Route with CEL. Sign every receipt. Deploy one binary anywhere.

Download binary Read the docs →

request flow · live

ContextWorker: ON

request POST /v1/chat/completions inbound: OpenAI · 2.1 KB

ContextWorker cw_pii_scrub · active pre-request · 0.14 ms · 2 redactions

backend anthropic-direct claude-sonnet-4 · streaming

worker ran

worker_idcw_pii_scrub_v2

modepre-request

latency0.14 ms

fields_modified[prompt]

redactions2

receipt

idrcpt_01HY...

ed25519signed

total overhead0.34 ms

backendanthropic-direct

status200

~238 µs overhead

CGO=0 binary

3 inbound formats

10 backend drivers

signed receipt per request

§ demo 01 / 04

ContextWorker · 01 / 04

Run a worker inside the request.

Drop a script or an R1 agent into the middle of a request. Mutate, enrich, block, audit. All inline. Scrub through one request's lifecycle and see what happens at each millisecond.

liverequest-lifecycle · /v1/chat/completions

scene 1 / 10

total duration: 1220 ms · worker overhead: 0.34 ms · signed events: 2

Request received t = 0.00s

state

event

contextworker

Illustrative. ContextWorker latencies vary with worker implementation. Benchmarks at /benchmarks.

§ demo 02 / 04

One canonical shape · 02 / 04

OpenAI, Anthropic, or Gemini in. ChatRequest out.

RelayGate translates every inbound format to a canonical ChatRequest before routing. Your code doesn't know or care which client protocol arrived. Switch providers without rewrites.

same logical request · three wire formats

inbound translator

inbound wire formatopenai

POST /v1/chat/completions

RelayGate inbound translator

0.03 ms

internal/inbound/openai.go

canonical outputstable

ChatRequest v1

{
  "model":    "<normalized>",
  "messages": "<canonical>",
  "tools":    []
}

output is identical across all three inbound formats

Your code doesn't know which format arrived. Three translators, one internal shape.

§ demo 03 / 04

How we differ · 03 / 04

When all backends are down, we tell you.

Most gateways queue your request internally and hope. RelayGate returns 503 plus Retry-After. Your SDK's retry logic takes over. You stay in control of timing, budget, and fallback.

simulation · all backends: circuit open

idle

idle — press trigger to simulate a full-provider outage

Typical gateway under outage

all backends circuit-open · queueing enabled

silent failures

RelayGate under outage

all backends circuit-open · honest backpressure

honest backpressure

Illustrative outage simulation. Underlying circuit-breaker benchmark: ~18.7 µs, zero allocations, from /benchmarks.

§ demo 04 / 04

One engine · 04 / 04

One expression language. Routing. Policy. Rate limits. Budgets.

CEL for routing decisions. CEL for policy gates. CEL for rate-limits and budget checks. Same engine, same syntax, one thing to learn. Type-checked expressions with predictable error messages.

config · /etc/relaygate/rules.cel

CEL v1

6 / 8 rules active routing · policy · rate-limit

Permitted action surface 6 active

All rules are CEL. One engine handles all four surfaces. Type-checked at config-load.

§ demo 05 / 08

Compose · 05 / 08

Chain ContextWorkers into a pipeline.

Pick from the gallery. Drop them into the request path in any order. The pipeline executes top-to-bottom, each worker can pass, mutate, enrich, block, or branch. Drag to reorder. The total overhead is the sum.

pipelinerelaygate.pipeline.compose

total: 0 workers · 0.00 ms

library10 workers

REQ · as the client sent itraw input

POST /v1/chat/completions
tenant: org_01HX
model:  claude-sonnet-4
tokens_max: 4000
prompt: "Review this PR for compliance. Contact the
         author at [email protected] or 555-12-3456 with
         issues. Reference: PR #4821."

pipeline · executes top → bottomdrop workers below

empty pipeline · request passes through at 0 ms

OUT · what the backend actually receivesafter pipeline

waiting for pipeline run…

auto-loop

worker action logidle

run a pipeline to see what each worker did to the request, step by step.

Workers execute in order. Mutations are passed to the next worker. Block aborts the chain and returns to caller. Every run emits a signed receipt listing every worker hit.

§ demo 06 / 08

In the suite · 06 / 08

Five flows through the Good Ventures Lab stack.

RelayGate composes with R1, TrueCom, RelayOne, DeepTap, Veritize, Actium, CloudSwarm, and Heroa. Step through a canonical scenario. Watch the request move across products in real time.

SaaS customer, fully managed path

LLM client → RelayOne → RelayGate → model. Grounding and verification inline.

step 0 / 0

trace · eventsidle

products in this flow

§ demo 07 / 08

In production · 07 / 08

Live traffic, live decisions. Watch the edge work.

Every request passes through CEL evaluation, ContextWorker execution, routing, and receipt signing. This is a simulated stream of ~8 rps. Click a request to see the full decision tree. Inject chaos to watch the system respond.

liverelaygate.console · tail -f /edge

● 0.0 rps

request stream0 / 0

decision tree—

click a request in the stream to inspect its decision tree

inject:

§ demo 08 / 08

Shape · 08 / 08

The shape of one binary.

Reveal the stack. Every capability below ships inside the same ~18 MB statically-compiled binary. No sidecars, no C dependencies, no runtime per-feature install.

inbound translators

3 formats

OpenAI · Anthropic · Gemini

backend drivers

10 providers

OpenAI · Anthropic · Google · Groq · DeepSeek · Together · Mistral · Cohere · OpenRouter · +1 slot

CEL engine

4 surfaces

routing · policy · rate-limits · budgets

ContextWorker runtime

inline exec

scripts · R1 agents · sub-millisecond spawn · ~12 µs

circuit breakers

~18.7 µs

zero allocations · per-backend

receipt signer

Ed25519

TrueCom-compatible · < 0.3 ms

install paths

6 package managers

brew · deb · rpm · tar · Docker · Helm

binary

~18 MB

CGO=0 · no runtime deps · static

Every cell is a feature that ships inside the single binary. No per-feature install, no sidecar. Total router overhead per request: ~238 µs.

Measured, not marketing

What RelayGate actually costs per request.

Published benchmarks. Updated on each release. Reproducible from the repo.

Benchmarks update with each release. Reproduce from the /benchmarks page.

View full benchmarks

What's rare here

Five things that don't exist in the rest of the AI gateway market.

Not features. Specific architectural decisions that make RelayGate a different product, not just a faster one.

01 / 05

ContextWorker: programmable middleware, not dressed-up routing.

Every AI gateway offers routing, caching, and retries. RelayGate is the only one that lets you drop a script or a full R1 agent into the middle of a request, run it inline at sub-millisecond overhead, and shape the request or response before anything reaches your backend. A PII scrub, a code-review pass, an inline tool-call round-trip: all run as ContextWorkers, not as out-of-band webhook callbacks with multi-second latency.

02 / 05

Honest backpressure, by design.

When every backend is down, most gateways queue your request internally and silently consume your timeout budget while trying to recover. RelayGate immediately returns 503 with a Retry-After header. Your SDK's retry logic takes over. You stay in control of timing, fallback, and budget. Silent failure is the feature you did not choose; honest backpressure is the one you wanted.

03 / 05

CEL for everything.

Most AI gateways invent a routing DSL, then a different policy DSL, then a separate rate-limit grammar, then a budget mini-language. RelayGate uses Common Expression Language across all four surfaces. One engine. Type-checked at config-load time, not at request time. Predictable error messages. Your platform team learns one thing.

04 / 05

One binary, CGO=0, no runtime dependencies.

RelayGate is a single statically-compiled Go binary with no C dependencies. Under twenty megabytes. Deploys on bare metal, container, Helm, Docker, or a Raspberry Pi. No glibc incompatibilities, no libstdc++ version pins, no "it worked in dev." The operations story is: ship the binary, set some environment variables, run.

05 / 05

Signed receipts per request, TrueCom-compatible.

Every request produces an Ed25519-signed receipt suitable for audit, billing, or dispute resolution. The receipt format matches TrueCom's commerce substrate, so the same receipt can land in your finance pipeline without a second integration. Compliance and finance see the same signed event.

Each claim corresponds to a measurable capability or a demo above. Benchmarks at /benchmarks.

In the portfolio · 9 products

RelayGate composes with the rest of the Good Ventures Lab suite.

RelayGate stands alone. It also turns into something more when it runs inside the suite. Click a product to see how it plugs into the request path.

products in the suitehover to inspect

RelayGate is open source at the core. Self-host is free forever. Managed and fleet features are commercial. · Part of Good Ventures Lab.

Pricing

Self-host is free. Managed starts when you want it to.

Full pricing and feature matrix on /pricing. Quick preview below.

Self-host

for teams running their own infrastructure

free forever · Apache 2.0 core

Single binary, full feature set
ContextWorker, CEL, all 10 backends, all 3 inbound formats
Community support, GitHub issues

Download binary

Managed

for teams that want someone else to run it

{{TBD-managed-price}}/mo starting at {{TBD-base-rate}} requests

Managed deployment, health monitoring, credential rotation
Managed ContextWorker library, priority support
Usage dashboards, quarterly review

Start managed trial

Enterprise

for regulated or fleet-scale deployments

Custom SLA · sovereign region options

SSO, SCIM, fleet deployment via RelayOne
SLA, dedicated success engineer
On-prem connect, sovereign region options

Contact sales

Pricing is request-rate based. See full schedule on /pricing.

Install RelayGate.

One binary. Every major platform. No runtime dependencies.

brew install relayone/tap/relaygate

No runtime dependencies. CGO=0. Apache 2.0 core, MIT drivers, Managed tier commercial.

Read the docs → Browse the repo