← BackJan 6, 2026

Uncovering a Faulty HTTP Retry in Rust: Lessons from the Bors Merge Bot

During the rigorous integration testing of the Bors GitHub merge bot, a mysterious empty‑body failure exposed a subtle bug in the octocrab crate's retry mechanism. By dissecting the request cloning logic, the issue was traced to a shallow‑clone of the request body, which swallowed the payload on retries. Fixing the clone to perform a true deep copy (or avoiding retries for unbuffered bodies) restored correct behavior and revealed broader insights about handling HTTP bodies, retry design, and debugging Rust dependencies.

### Background The Bors merge bot is a lightweight, opinionated GitHub bot that automates PR processing as part of the **bors** project. Recent work focused on moving the bot into a production‑ready configuration: race‑condition fixes, a more robust HTTP client, and a dramatically expanded integration‑test suite. The test harness is intentionally heavy‑weight—each test boots a full Bors web instance, connects to a live PostgreSQL database, and drives the bot using real GitHub webhook emulation via WireMock. ### The Mysterious Failure After a large refactor, a handful of tests started to panic within the mocked GitHub endpoint: a PATCH request that should carry a JSON body returned empty, causing a deserialization failure. ```rust let data: SetRefRequest = req.body_json().unwrap(); // panics ``` The bot uses **octocrab** to send the PATCH. Code that builds the request clearly includes a body: ```rust let res = repo .client() ._patch( url.as_str(), Some(&serde_json::json!({ "sha": sha.as_ref(), "force": matches!(force, ForcePush::Yes), })), ).await?; ``` The body was present on the first server hit, but disappeared on subsequent retries, which the bot executed automatically because the server replied with 5xx codes. The problem surfaced only after several retries, a phenomenon that made the bug look like a subtle race in the bot’s own code. ### Diving into Dependencies Because almost all Rust code is safe, the initial hypothesis was that the issue lied inside a dependency, but not necessarily in the bot itself. The debugging flow progressed as follows: 1. **WireMock** was inspected by cloning the crate locally and sprinkling debug logs. No anomalies surfaced. 2. **Hyper** (used by WireMock) never modified the incoming request bodies, ruling out a server‑side bug. 3. **Wireshark** inspection of the local HTTP traffic confirmed the client sent an empty body on retries. 4. Disabling octocrab’s built‑in retry mechanism (`add_retry_config(RetryConfig::None)`) eliminated the empty‑body problem. Thus the culprit was octocrab’s retry policy. ### Root Cause: Shallow Body Clone Octocrab’s request body type, `OctoBody`, was defined as: ```rust #[derive(Debug)] pub struct OctoBody(Arc>); ``` The `Clone` implementation merely cloned the `Arc`, producing a *shallow copy* that shared the same `RwLock` instance. ```rust impl Clone for OctoBody { fn clone(&self) -> Self { OctoBody(Arc::clone(&self.0)) } } ``` The request‑body is consumed on the first send: Hyper polls the body via `is_end_stream`, returning `true` once consumed. On retry, the same body is reused; Hyper sees the `is_end_stream` flag as `true` and sends an empty body, leading to the observed failures. This bug hinges on two Rust idioms that often clash: *reference‑counted* data (*Arc*) and *interior mutability* (*RwLock*). If the body had no interior mutability, a shallow clone would merely yield a second reference to the same buffer, incurring a negligible correctness impact. The combination, however, caused the logical state to be lost on retry. ### Fix Implemented in octocrab 0.49.1 The fix introduces a dedicated, byte‑sized, immutable copy of the body used only for retries: ```rust #[derive(Debug)] pub struct OctoBody { body: Arc>, // original body buffered: Option, // deep copy for retries } ``` During construction, when the underlying payload is available as a `Vec`, a `Bytes` clone is stored in `buffered`. The `try_clone` helper then performs a *deep* copy of this buffer for use in a new request: ```rust fn try_clone(&self) -> Option { self.buffered.as_ref().map(|b| { Self::create(http_body_util::Full::from(b.clone()), Some(b.clone())) }) } ``` If a request body cannot be buffered (e.g. a streaming body), the retry mechanism simply aborts the retry, avoiding the empty‑body mishap. ### Implications for the Bors Project By disabling octocrab’s retry policy in Bors, the upstream bug was masked, but the bot still suffered from double‑retry logic—once from its own retry mechanism and once from octocrab. Removing octocrab’s automatic retries both restored correct behavior and simplified the test logic. The incident also highlighted the risk of heavy integration tests: the Bors test harness triggers a rare code path (retrying a PATCH with a body) that is uncommon in everyday bot usage but surfaced because the tests explicitly simulate transient failures. ### Lessons Learned 1. **Trust but Verify** – Even well‑reviewed crates can harbor subtle bugs in edge cases; a disciplined fault‑recovery strategy in tests helps surface them. 2. **Understand Body Ownership** – HTTP bodies are often streamed; shallow cloning of body objects that use interior mutability can break retries. 3. **Avoid Implicit Retries** – Third‑party clients may enable retries behind the scenes. If your business logic already implements retries, consider disabling auto‑retries to eliminate duplicate effort. 4. **LLM Assistance** – The author tested a modern language model to locate the bug. While the model quickly flagged the retry bug and the shallow clone issue, it got sidetracked by misinterpretations of dependency features. Human guidance remains essential. ### Conclusion The Bors merge bot story illustrates how a seemingly innocuous bug—the absence of a request body after an automatic retry—revealed a deeper flaw in the octocrab crate’s body cloning logic. The fix not only solved two bugs in Bors but also improved the robustness of octocrab for all users. Such lessons reinforce the importance of comprehensive integration testing, careful design of retry strategies, and a nuanced understanding of Rust’s ownership and concurrency primitives. For those facing similar challenges, consider inspecting how your HTTP client clones request bodies and whether your retry logic overlaps with the client’s built‑in mechanisms. Sharing findings—like the Bors team did—strengthens the ecosystem and benefits everyone who relies on these crates in production.