TTU — Response-Aware Inference Router
An automatic gearbox for AI inference. Routes every query to the right-sized model, based on each response, not manual rules. Provider-agnostic drop-in proxy.
The right gear for every question
Like an automatic gearbox that selects the right gear for every driving condition: easy queries stay in low gear, handled quickly by an efficient model. Complex queries shift up, escalated to a powerful model. TTU measures the model's own quality signal on each response to make this decision. No manual rules, no guesswork.
Every existing inference router lets users or developers choose which model handles each query, by price, latency, or manual rules. TTU is different: it measures quality after the response, not before.
Response-aware routing
Each response is assessed individually using a proprietary quality estimation method. If quality is high, the efficient model's answer is used. If not, the query is escalated to a more powerful model. No manual rules needed.
Drop-in proxy
Provider-agnostic API proxy. Compatible with OpenAI, Anthropic, and any LLM API. One line change in your application code. Supports streaming and all standard parameters.
Safety routing
For safety-critical domains (medical, legal, financial), TTU includes domain detection and quality monitoring. Queries flagged as safety-critical are routed through additional verification before reaching the user.
Routing dashboard
See which model answers each query, why it was routed that way, and cumulative cost savings. Full audit trail for every routing decision.
1,000 queries, verified results
MMLU benchmark. TTU routes queries between a small and large model. Results from a single statistically robust run.
Six verified scenarios across different query types and model pairs. Safety routing adds domain detection and quality monitoring. 28 tests passing (9 proxy + 10 safety + 9 validation).
Routing based on the response, not the prompt
Existing inference routers let users or rules decide which model handles each query, by price, latency, or provider. TTU measures the model's own quality signal on each response.
| Approach | How it works | Assesses response? |
|---|---|---|
| API gateways | Multi-provider proxy, user selects model | No, manual selection |
| Open-source proxies | Unified API layer with routing rules | No, rule-based |
| Observability platforms | Logging, monitoring, cost tracking | No, monitoring only |
| Provider auto-routing | Vendor-native model selection | Partial, limited to own models |
| TTU | Response-aware routing + safety | Yes, proprietary quality assessment |