LLMRouter: An OpenâSource Library for Intelligent LLM Routing
LLMRouter is a comprehensive, modular framework that dynamically routes queries to the most suitable large language model, balancing performance, cost, and task complexity. It ships with 16+ preâbuilt routing strategies, a unified CLI, and a flexible plugin system for custom routers and tasks. With a full dataâgeneration pipeline and robust APIâkey handling, the library empowers researchers and practitioners to deploy costâeffective, highâquality LLM services at scale.
## Introduction
LLMRouter is a nextâgeneration routing framework that optimises LLM inference by selecting the most appropriate model for each individual query. By leveraging a variety of supervised and unsupervised routing algorithms, the system can deliver lowâlatency, costâaware responses without compromising accuracy.
### Core Features
* **Smart Routing** â Automatically directs requests to the optimal LLM based on task complexity, monetary cost, and target performance.
* **Extensive Router Library** â More than 16 builtâin routers spanning singleâround, multiâround, agentic, and personalized categories. Techniques include KâNearest Neighbors, Support Vector Machines, MultiâLayer Perceptrons, Matrix Factorisation, Elo Rating, graphâbased methods, BERTâaugmented strategies, hybrid probabilistic approaches, and transformerâscore routers.
* **Unified CommandâLine Interface** â Train, evaluate and chat with LLMRouter through a single CLI. The Gradioâbased UI provides a quick way to experiment interactively.
* **EndâtoâEnd Data Pipeline** â Generates training data from 11 public benchmarks, creates embeddings, calls LLM APIs, evaluates responses, and produces a comprehensive routing dataset.
## Getting Started
### Installation
Clone and install in editable mode to experiment with the source code:
```
git clone https://github.com/ulab-uiuc/LLMRouter.git
cd LLMRouter
conda create -n llmrouter python=3.10
conda activate llmrouter
pip install -e .
```
Optional GPUâenabled extras:
```
pip install -e ".[router-r1]" # requires vllm==0.6.3, torch==2.4.0
pip install -e ".[all]" # pulls in every optional dependency
```
Or install the PyPI package for a minimal setup:
```
pip install llmrouter-lib
```
### API Key Configuration
LLMRouter requires credentials for all LLM services used in inference or data generation. Set the `API_KEYS` environment variable before running any command:
```bash
# Serviceâspecific dictionary (recommended for multiâprovider setups)
export API_KEYS='{"NVIDIA": "nvidia-key-1,nvidia-key-2", "OpenAI": ["openai-key-1","openai-key-2"], "Anthropic": "anthropic-key-1"}'
```
Each key can be a commaâseparated string, a JSON array, or a single string. The system matches the `service` field in your candidate JSON to the corresponding dict entry and cycles through keys in a roundârobin fashion. For legacy singleâprovider configurations you may use a commaâseparated list or a bare string.
### Endpoint Configuration
Perâmodel endpoints have the highest priority. If a model does not specify its own `api_endpoint`, the routerâlevel default in the YAML file is used. The following two snippets illustrate the layering:
*Candidate JSON* (default_lmm.json):
```json
{
"qwen2.5-7b-instruct": {
"model": "qwen/qwen2.5-7b-instruct",
"api_endpoint": "https://integrate.api.nvidia.com/v1"
}
}
```
*Router YAML* (fallback):
```yaml
api_endpoint: 'https://integrate.api.nvidia.com/v1'
```
If neither is present, the CLI aborts with an explanatory error.
### Local LLM Support
LLMRouter natively works with OpenAIâcompatible local servers such as Ollama, vLLM, or SGLang. For local providers, specify an empty string for the API key:
```bash
export API_KEYS='{"Ollama": ""}'
```
The router will automatically detect localhost endpoints and authenticate using the provided empty key.
## Data Generation Pipeline
LLMRouterâs pipeline transforms raw benchmark datasets into readyâtoâtrain routing examples. The threeâstep workflow is:
1. **Extract Query Data** â Pulls questions from datasets like MMLU, GSM8K, HumanEval, etc., producing `query_data_train.jsonl` and `query_data_test.jsonl`.
2. **Compute Embeddings** â Generates embeddings for each candidate model from its metadata.
3. **Call LLM APIs & Evaluate** â Concurrently requests responses, measures performance metrics, and aggregates embeddings.
```bash
echo "[Step 1] Generating query data"
python llmrouter/data/data_generation.py --config llmrouter/data/sample_config.yaml
# Step 2: Model embeddings
python llmrouter/data/generate_llm_embeddings.py --config llmrouter/data/sample_config.yaml
# Step 3: API calling & evaluation (requires API_KEYS)
python llmrouter/data/api_calling_evaluation.py --config llmrouter/data/sample_config.yaml --workers 100
```
The output files include training/testing splits, embedding dictionaries, and finalized routing logs in JSONL format.
## Training Routers
After generating data, you can train any builtâin router. Commands are short and mirror the routerâs name:
```bash
# KNN router
llmrouter train --router knnrouter --config configs/model_config_train/knnrouter.yaml
# MLP router on GPU
CUDA_VISIBLE_DEVICES=2 llmrouter train --router mlprouter --config configs/model_config_train/mlprouter.yaml --device cuda
# Matrix Factorisation quietly
CUDA_VISIBLE_DEVICES=1 llmrouter train --router mfrouter --config configs/model_config_train/mfrouter.yaml --device cuda --quiet
```
Training scripts expose `--device`, `--quiet`, and hyperâparameter overrides, giving fineâgrained control.
## Inference & Chat
Inference can be performed on single queries, batches from file, or via an interactive chat UI.
```bash
# Single query
llmrouter infer --router knnrouter --config config.yaml --query "What is machine learning?"
# Batch from text file
llmrouter infer --router knnrouter --config config.yaml --input queries.txt --output results.json
# Route only â no external API call
llmrouter infer --router knnrouter --config config.yaml --query "Hello" --route-only
```
Chat interface (requires API keys):
```bash
llmrouter chat --router knnrouter --config config.yaml
# Custom host/port
llmrouter chat --router knnrouter --config config.yaml --host 0.0.0.0 --port 7860
# Public sharing link
llmrouter chat --router knnrouter --config config.yaml --share
# Choose context window mode
llmrouter chat --router knnrouter --config config.yaml --mode full_context --top_k 5
```
Supported query modes:
* `current_only` â routes based solely on the current input.
* `full_context` â concatenates the full chat history with the current query.
* `retrieval` â pulls the topâk similar past queries to inform routing.
## Extending LLMRouter
### Custom Routers
Create a new router in `custom_routers/` with a `router.py` implementation that inherits from `MetaRouter`. The framework automatically discovers routers via the directory structure, user home directory, and any paths listed in `LLMROUTER_PLUGINS`.
Example structure:
```
custom_routers/my_router/
âââ router.py
âââ config.yaml
```
After implementation, use it exactly like builtâin routers:
```bash
llmrouter infer --router my_router --config custom_routers/my_router/config.yaml --query "What is machine learning?"
```
The library also ships example routers such as `RandomRouter` and `ThresholdRouter`, which illustrate baseline strategies and more sophisticated, trainable approaches.
### Custom Tasks
Define a new task by registering a prompt template, a prompt formatter, and optionally a metric. The framework automatically loads these components during data generation, enabling new benchmarks or domainâspecific evaluation protocols without modifying core code.
### Plugin Discovery
The discovery order is:
1. `./custom_routers/` â projectâspecific.
2. `~/.llmrouter/plugins/` â userâlevel.
3. Paths in the `$LLMROUTER_PLUGINS` environment variable.
## Roadmap & Contributions
Future work includes enhancing personalized routing with richer user profiling, integrating multimodal routing for images and audio, and enabling continual online learning to adapt to domain drift. Contributions that implement these features, new routing algorithms, or additional evaluation metrics are welcome.
## Acknowledgements
LLMRouter builds on seminal community researchâRouteLLM, RouterDC, AutoMix, Hybrid LLM, GraphRouter, GMTRouter, and others. The libraryâs extensible architecture rewards contributions that push the state of the art in LLM routing.
## Citation
If you use LLMRouter in your research, please cite:
```
@misc{llmrouter2025,
title = {LLMRouter: An Open-Source Library for LLM Routing},
author = {Tao Feng and Haozhen Zhang and Zijie Lei and Haodong Yue and Chongshan Lin and Jiaxuan You},
year = {2025},
howpublished = {\url{https://github.com/ulab-uiuc/LLMRouter}},
note = {GitHub repository}
}
```