Route with Precision
Smart LLM Routing
Automatically route every prompt to the right model — optimized for cost, complexity, latency, and availability. FloTorch's multi-model routing engine activates the moment you configure more than one model, with zero changes to your existing application code.
Users Prompts
.png)
.png)
Time based or Cost Based
.png)
LLM A
.png)
LLM B
.png)
LLM C
Time or cost based LLM routing
Time or cost based LLM routing
.avif)
.avif)






FOUR STRATEGIES. ONE INTELLIGENT GATEWAY
Precision Multi-Model Routing Built for Production AI

Fallback Routing — Always Deliver a Response
Define a primary model and an ordered fallback chain. If the primary model fails, times out, or hits a rate limit, FloTorch automatically routes the request to the next model in sequence — continuing down the chain until a response is returned. Best for mission-critical workloads that require high availability and zero dropped requests.

Weighted Routing — Proportional Traffic Distribution
Assign a weight to each model and FloTorch distributes traffic accordingly. A model weighted at 70 handles roughly 70% of requests; a model weighted at 30 handles the rest. Adjust weights at any time to shift traffic without touching your application. Best for A/B testing, gradual rollouts, and directing volume toward preferred or cost-efficient models.

Cost, Keyword & Schedule Routing Controls
Attach per-model routing configurations to any strategy. Set monthly budgets with alert thresholds to prevent overspend and auto-exclude models when limits are hit. Define keyword conditions using operators like Equals, Contains, and Regex to route domain-specific workloads to the right model. Restrict model availability to defined time windows with schedule-based routing. All three settings stack and are evaluated in priority order: Cost → Keywords → Schedule.

Round Robin Routing — Balanced Load Across Models
Distribute requests sequentially across all configured models in rotation. Each incoming request goes to the next model in line; after the last model, the cycle starts again. All models receive an approximately equal share of traffic over time. Best for load balancing across equivalent models or provider endpoints.

Smart Routing — Match Prompt Complexity to Model Capability
FloTorch's built-in complexity identification engine scores each incoming prompt from Very Low to Very High and routes it to the model assigned to that complexity tier — sending simple queries to lightweight models and complex reasoning tasks to high-capability ones. When multiple models qualify, routing falls back to cost, keyword, and schedule conditions in priority order.

Full Observability Across Every Routing Decision
Every routing decision FloTorch makes is logged and traceable. Access request logs and trace waterfalls to see which model was selected, why it was selected, and how it performed — at the token, cost, and latency level. Routing observability applies across all strategies and all models, including those used by agents built on FloTorch and models running in evaluation pipelines.

