← BackJan 7, 2026

RepoReaper: Autonomous Architectural Analysis and Bilingual Semantic Search via Dynamic RAG Cache

RepoReaper redefines “Chat with Code” by treating an LLM as the CPU and a vector store as a high‑speed L2 cache, enabling autonomous traversal and on‑demand enrichment of a repository’s context. Leveraging AST‑aware chunking, a hybrid BM25/Vector hybrid retrieval, and a ReAct‑based Just‑In‑Time agent, it delivers precise, multilingual code‑level insights without static indexing. Designed for production, the system ships in Docker, supports DeepSeek and SiliconFlow APIs, and offers a polished Live Demo with intelligent language handling.

# RepoReaper: Autonomous Architectural Analysis and Bilingual Semantic Search via Dynamic RAG Cache ## Introduction Modern software maintenance increasingly demands tooling that can understand large codebases in real time, without the latency of full indexing or the brittleness of brittle search pipelines. **RepoReaper** addresses this challenge by treating the Large Language Model (LLM) itself as the CPU while the vector store functions as an adaptive L2 cache. The framework parses the repository’s Abstract Syntax Tree (AST) to build a lightweight symbol map, then dynamically pre‑fetches the most architecturally relevant files, and finally employs a ReAct loop to fetch missing context on demand. ## Core Philosophy: RAG as a Dynamic Cache Unlike conventional Retrieval‑Augmented Generation (RAG) systems, which perform static look‑ups, RepoReaper’s RAG layer acts as a real‑time, just‑in‑time cache: * **Cold Start – Repo Map**: A one‑time AST traversal generates a global map of classes, functions, and modules, enabling instant navigation of the code tree. * **Prefetching – Analysis Phase**: The agent autonomously selects 10‑20 files that are most impactful for architectural comprehension, parses them, and pre‑loads their embeddings into the cache. * **Cache‑Miss Handling – ReAct Loop**: During user queries, if the BM25+Vector retrieval returns insufficient context, the Agent triggers a `` command to pull the missing files via the GitHub API, updates the cache, and re‑generates the answer seamlessly. ## Architectural Innovations 1. **AST‑Aware Semantic Chunking** * **Logical Boundaries** – Code is split by class and method definitions rather than by raw token windows, preserving logical cohesion. * **Context Injection** – Parent class signatures and docstrings are embedded in each method chunk, giving the LLM insight into both purpose (“why”) and implementation (“how”). 2. **Asynchronous Concurrency Pipeline** * Built atop *asyncio* and *httpx*, the system performs repository parsing, AST extraction, and vector embedding in a non‑blocking fashion. * Deployment uses *Gunicorn* with *Uvicorn* workers; the *VectorStoreManager* synchronizes context via persistent ChromaDB instances, ensuring stateless workers without race conditions. 3. **Just‑In‑Time ReAct Agent** * **Query Rewrite** – The LLM translates ambiguous or bilingual queries into canonical, English‑only technical terms for optimal BM25/Vector search. * **Self‑Correction** – When context is insufficient, the Agent emits a tool invocation, fetches the exact file snippets, re‑indexes them, and re‑invokes the model in the same inference cycle. 4. **Hybrid Search Mechanism** * Dense retrieval applies BAAI/bge‑m3 embeddings to capture conceptual similarity. * Sparse retrieval (BM25Okapi/Rank‑BM25) preserves exact name matching for function signatures and error codes. * Results are fused via Reciprocal Rank Fusion (RRF) to rank the most relevant snippets for the LLM. 5. **Native Bilingual Support** * The prompt engineering module detects the language of user input and swaps the System Prompt accordingly, ensuring that the tone, terminology, and output format honor the user’s locale. * A language toggle in the UI propagates through the entire pipeline, including the initial architectural report and subsequent Q&A. ## Technical Stack | Layer | Technology | |---|---| | Core | Python 3.10+, FastAPI, AsyncIO | | LLM | OpenAI SDK (DeepSeek/SiliconFlow) | | Vector DB | ChromaDB (persistent disk) | | Search | BM25Okapi, Rank‑BM25, RRF | | Parsing | Python ast | | Frontend | HTML5, Server‑Sent Events, Mermaid.js | | Deployment | Docker, Gunicorn, Uvicorn | ## Performance & Reliability * **Session Management** – Combines browser sessionStorage with server‑side persistence, allowing warm cache state to survive page refreshes. * **Network Resilience** – Graceful handling of GitHub API throttling (403/429) and timeouts ensures consistent user experience. * **Memory Efficiency** – The VectorStoreManager maintains state on disk only, preventing memory leaks in long‑running containers. ## Quick Start Guide > **Prerequisites** – Python 3.9+, a valid GitHub Personal Access Token, LLM API keys (DeepSeek‑V3 or SiliconFlow recommended).
1. **Clone** ```bash git clone https://github.com/tzzp1224/RepoReaper.git cd RepoReaper ``` 2. **Create Virtual Environment** (recommended) ```bash python -m venv venv source venv/bin/activate # Windows: venv\Scripts\activate ``` 3. **Install Dependencies** ```bash pip install -r requirements.txt ``` 4. **Configure Environment** – Create `.env` in the root: ```dotenv GITHUB_TOKEN=ghp_your_token_here DEEPSEEK_API_KEY=sk_your_key_here SILICON_API_KEY=sk_your_key_here ``` 5. **Run Locally** – Universal option: ```bash python -m app.main ``` For production, you may use Gunicorn: ```bash gunicorn -c gunicorn_conf.py app.main:app ``` 6. **Docker Deployment** ```bash docker build -t reporeaper . docker run -d -p 8000:8000 --env-file .env --name reporeaper reporeaper ``` 7. **Access** – Open and input a GitHub repository URL to trigger autonomous analysis. ## Live Demo & Availability The project hosts a public demo; however, shared API quotas may cause rate‑limit errors (403/429). For a smooth experience, especially for users in China, deploy the Seoul server locally or clone the repository and run it locally. --- RepoReaper exemplifies how an LLM‑driven analysis agent can move beyond static indexing to provide a real‑time, bilingual, architecture‑aware code exploration experience, making it a potent tool for senior technical leads and platform teams seeking deeper repository introspection without the overhead of traditional tooling.