← BackJan 5, 2026

LLMRouter: An Open‑Source Library for Intelligent LLM Routing

LLMRouter is a comprehensive, modular framework that dynamically routes queries to the most suitable large language model, balancing performance, cost, and task complexity. It ships with 16+ pre‑built routing strategies, a unified CLI, and a flexible plugin system for custom routers and tasks. With a full data‑generation pipeline and robust API‑key handling, the library empowers researchers and practitioners to deploy cost‑effective, high‑quality LLM services at scale.

## Introduction LLMRouter is a next‑generation routing framework that optimises LLM inference by selecting the most appropriate model for each individual query. By leveraging a variety of supervised and unsupervised routing algorithms, the system can deliver low‑latency, cost‑aware responses without compromising accuracy. ### Core Features * **Smart Routing** – Automatically directs requests to the optimal LLM based on task complexity, monetary cost, and target performance. * **Extensive Router Library** – More than 16 built‑in routers spanning single‑round, multi‑round, agentic, and personalized categories. Techniques include K‑Nearest Neighbors, Support Vector Machines, Multi‑Layer Perceptrons, Matrix Factorisation, Elo Rating, graph‑based methods, BERT‑augmented strategies, hybrid probabilistic approaches, and transformer‑score routers. * **Unified Command‑Line Interface** – Train, evaluate and chat with LLMRouter through a single CLI. The Gradio‑based UI provides a quick way to experiment interactively. * **End‑to‑End Data Pipeline** – Generates training data from 11 public benchmarks, creates embeddings, calls LLM APIs, evaluates responses, and produces a comprehensive routing dataset. ## Getting Started ### Installation Clone and install in editable mode to experiment with the source code: ``` git clone https://github.com/ulab-uiuc/LLMRouter.git cd LLMRouter conda create -n llmrouter python=3.10 conda activate llmrouter pip install -e . ``` Optional GPU‑enabled extras: ``` pip install -e ".[router-r1]" # requires vllm==0.6.3, torch==2.4.0 pip install -e ".[all]" # pulls in every optional dependency ``` Or install the PyPI package for a minimal setup: ``` pip install llmrouter-lib ``` ### API Key Configuration LLMRouter requires credentials for all LLM services used in inference or data generation. Set the `API_KEYS` environment variable before running any command: ```bash # Service‑specific dictionary (recommended for multi‑provider setups) export API_KEYS='{"NVIDIA": "nvidia-key-1,nvidia-key-2", "OpenAI": ["openai-key-1","openai-key-2"], "Anthropic": "anthropic-key-1"}' ``` Each key can be a comma‑separated string, a JSON array, or a single string. The system matches the `service` field in your candidate JSON to the corresponding dict entry and cycles through keys in a round‑robin fashion. For legacy single‑provider configurations you may use a comma‑separated list or a bare string. ### Endpoint Configuration Per‑model endpoints have the highest priority. If a model does not specify its own `api_endpoint`, the router‑level default in the YAML file is used. The following two snippets illustrate the layering: *Candidate JSON* (default_lmm.json): ```json { "qwen2.5-7b-instruct": { "model": "qwen/qwen2.5-7b-instruct", "api_endpoint": "https://integrate.api.nvidia.com/v1" } } ``` *Router YAML* (fallback): ```yaml api_endpoint: 'https://integrate.api.nvidia.com/v1' ``` If neither is present, the CLI aborts with an explanatory error. ### Local LLM Support LLMRouter natively works with OpenAI‑compatible local servers such as Ollama, vLLM, or SGLang. For local providers, specify an empty string for the API key: ```bash export API_KEYS='{"Ollama": ""}' ``` The router will automatically detect localhost endpoints and authenticate using the provided empty key. ## Data Generation Pipeline LLMRouter’s pipeline transforms raw benchmark datasets into ready‑to‑train routing examples. The three‑step workflow is: 1. **Extract Query Data** – Pulls questions from datasets like MMLU, GSM8K, HumanEval, etc., producing `query_data_train.jsonl` and `query_data_test.jsonl`. 2. **Compute Embeddings** – Generates embeddings for each candidate model from its metadata. 3. **Call LLM APIs & Evaluate** – Concurrently requests responses, measures performance metrics, and aggregates embeddings. ```bash echo "[Step 1] Generating query data" python llmrouter/data/data_generation.py --config llmrouter/data/sample_config.yaml # Step 2: Model embeddings python llmrouter/data/generate_llm_embeddings.py --config llmrouter/data/sample_config.yaml # Step 3: API calling & evaluation (requires API_KEYS) python llmrouter/data/api_calling_evaluation.py --config llmrouter/data/sample_config.yaml --workers 100 ``` The output files include training/testing splits, embedding dictionaries, and finalized routing logs in JSONL format. ## Training Routers After generating data, you can train any built‑in router. Commands are short and mirror the router’s name: ```bash # KNN router llmrouter train --router knnrouter --config configs/model_config_train/knnrouter.yaml # MLP router on GPU CUDA_VISIBLE_DEVICES=2 llmrouter train --router mlprouter --config configs/model_config_train/mlprouter.yaml --device cuda # Matrix Factorisation quietly CUDA_VISIBLE_DEVICES=1 llmrouter train --router mfrouter --config configs/model_config_train/mfrouter.yaml --device cuda --quiet ``` Training scripts expose `--device`, `--quiet`, and hyper‑parameter overrides, giving fine‑grained control. ## Inference & Chat Inference can be performed on single queries, batches from file, or via an interactive chat UI. ```bash # Single query llmrouter infer --router knnrouter --config config.yaml --query "What is machine learning?" # Batch from text file llmrouter infer --router knnrouter --config config.yaml --input queries.txt --output results.json # Route only – no external API call llmrouter infer --router knnrouter --config config.yaml --query "Hello" --route-only ``` Chat interface (requires API keys): ```bash llmrouter chat --router knnrouter --config config.yaml # Custom host/port llmrouter chat --router knnrouter --config config.yaml --host 0.0.0.0 --port 7860 # Public sharing link llmrouter chat --router knnrouter --config config.yaml --share # Choose context window mode llmrouter chat --router knnrouter --config config.yaml --mode full_context --top_k 5 ``` Supported query modes: * `current_only` – routes based solely on the current input. * `full_context` – concatenates the full chat history with the current query. * `retrieval` – pulls the top‑k similar past queries to inform routing. ## Extending LLMRouter ### Custom Routers Create a new router in `custom_routers/` with a `router.py` implementation that inherits from `MetaRouter`. The framework automatically discovers routers via the directory structure, user home directory, and any paths listed in `LLMROUTER_PLUGINS`. Example structure: ``` custom_routers/my_router/ ├── router.py └── config.yaml ``` After implementation, use it exactly like built‑in routers: ```bash llmrouter infer --router my_router --config custom_routers/my_router/config.yaml --query "What is machine learning?" ``` The library also ships example routers such as `RandomRouter` and `ThresholdRouter`, which illustrate baseline strategies and more sophisticated, trainable approaches. ### Custom Tasks Define a new task by registering a prompt template, a prompt formatter, and optionally a metric. The framework automatically loads these components during data generation, enabling new benchmarks or domain‑specific evaluation protocols without modifying core code. ### Plugin Discovery The discovery order is: 1. `./custom_routers/` – project‑specific. 2. `~/.llmrouter/plugins/` – user‑level. 3. Paths in the `$LLMROUTER_PLUGINS` environment variable. ## Roadmap & Contributions Future work includes enhancing personalized routing with richer user profiling, integrating multimodal routing for images and audio, and enabling continual online learning to adapt to domain drift. Contributions that implement these features, new routing algorithms, or additional evaluation metrics are welcome. ## Acknowledgements LLMRouter builds on seminal community research—RouteLLM, RouterDC, AutoMix, Hybrid LLM, GraphRouter, GMTRouter, and others. The library’s extensible architecture rewards contributions that push the state of the art in LLM routing. ## Citation If you use LLMRouter in your research, please cite: ``` @misc{llmrouter2025, title = {LLMRouter: An Open-Source Library for LLM Routing}, author = {Tao Feng and Haozhen Zhang and Zijie Lei and Haodong Yue and Chongshan Lin and Jiaxuan You}, year = {2025}, howpublished = {\url{https://github.com/ulab-uiuc/LLMRouter}}, note = {GitHub repository} } ```