TTU — Response-Aware Inference Router

An automatic gearbox for AI inference. Routes every query to the right-sized model, based on each response, not manual rules. Provider-agnostic drop-in proxy.

The right gear for every question

Like an automatic gearbox that selects the right gear for every driving condition: easy queries stay in low gear, handled quickly by an efficient model. Complex queries shift up, escalated to a powerful model. TTU measures the model's own quality signal on each response to make this decision. No manual rules, no guesswork.

Every existing inference router lets users or developers choose which model handles each query, by price, latency, or manual rules. TTU is different: it measures quality after the response, not before.

Response-aware routing

Each response is assessed individually using a proprietary quality estimation method. If quality is high, the efficient model's answer is used. If not, the query is escalated to a more powerful model. No manual rules needed.

Drop-in proxy

Provider-agnostic API proxy. Compatible with OpenAI, Anthropic, and any LLM API. One line change in your application code. Supports streaming and all standard parameters.

Safety routing

For safety-critical domains (medical, legal, financial), TTU includes domain detection and quality monitoring. Queries flagged as safety-critical are routed through additional verification before reaching the user.

Routing dashboard

See which model answers each query, why it was routed that way, and cumulative cost savings. Full audit trail for every routing decision.

1,000 queries, verified results

MMLU benchmark. TTU routes queries between a small and large model. Results from a single statistically robust run.

Quality retained
99.8%
Cost reduction
51%
Queries tested
1,000
Routing overhead
0.16μs

Six verified scenarios across different query types and model pairs. Safety routing adds domain detection and quality monitoring. 28 tests passing (9 proxy + 10 safety + 9 validation).

Routing based on the response, not the prompt

Existing inference routers let users or rules decide which model handles each query, by price, latency, or provider. TTU measures the model's own quality signal on each response.

ApproachHow it worksAssesses response?
API gatewaysMulti-provider proxy, user selects modelNo, manual selection
Open-source proxiesUnified API layer with routing rulesNo, rule-based
Observability platformsLogging, monitoring, cost trackingNo, monitoring only
Provider auto-routingVendor-native model selectionPartial, limited to own models
TTUResponse-aware routing + safetyYes, proprietary quality assessment
MCGCoF AuditFull stackCalculate your savings