← BackJan 4, 2026

Redesigning Error Handling in Rust: From Forwarding to Purposeful Communication

Rust’s current error‑handling ecosystem often devolves into opaque forwarding, stripping actionable insight. This article surveys the shortcomings of std::error::Error, backtraces, the Provider/Request API, and popular libraries like thiserror and anyhow, then proposes a dual‑audience model that balances machine‑readable, retry‑friendly errors with human‑friendly, context‑rich debugging information.

# Error Forwarding: A Silent Saboteur It’s 3 am, the servers are down, and a log entry shouts: ``` Error: serialization error: expected ',' or '}' at line 3, column 7 ``` You have a stack trace that has bubbled through twenty layers, preserving the message but losing every shred of meaning. The problem is not a single bug; it’s the way we propagate errors. We catch, optionally wrap, and then throw them up the stack fast, treating errors like hot potatoes rather than useful messages. ## The Limitations of Rust’s Standard Abstractions ### std::error::Error – A Chain, Not a Tree The `Error` trait assumes a linear chain of causes: `error.source() -> Option<&dyn Error>`. Most errors fit this model, but many real‑world failures form trees. A validation step might fail on several fields; a timeout might leave partial results. The standard trait offers no way to model these branching causes. ### Backtraces – Cheap but Misleading Rust’s `std::backtrace::Backtrace` improves observability, but: * In async contexts it contains dozens of opaque `GenFuture::poll()` frames, making the logical path opaque. * It only tells you where an error was created, not the application‑level flow that led there. * Capturing a backtrace can be a quite expensive runtime operation. The result is a stack trace that looks like a breadcrumb trail of compiler‑generated frames, not a story of where the user’s request died. ### Provider/Request API – Over‑engineering RFC 3192 introduces `provide` and `request` to let errors expose arbitrary typed data (HTTP status codes, backtraces, etc.). While flexible, the API is unpredictable (an error may or may not provide a status) and subtle enough that compilers struggle to optimize. In practice it adds unnecessary indirection rather than clarity. ## Popular Libraries – Good Intent, Bad Outcome ### thiserror – Origin‑Centric Enums `thiserror` makes defining error enums painless: ```rust #[derive(Debug, thiserror::Error)] pub enum DatabaseError { #[error("connection failed: {0}")] Connection(#[from] ConnectionError), #[error("query failed: {0}")] Query(#[from] QueryError), #[error("serialization failed: {0}")] Serde(#[from] serde_json::Error), } ``` This enum tells you *where* the failure happened, but not *what to do* about it. A calling layer cannot decide whether to retry, report, or swallow the error without custom logic. ### anyhow – Silly Context `anyhow` erases types behind `anyhow::Result` and encourages optional `.context()` calls. The compiler never forces you to add context. As a result, chains of `?` propagate shallow, ambiguous errors. Developers often forget to add the rich context that would be invaluable at night shifts. ## The Fundamental Disconnect Error handling has traditionally focused on type safety and compiler satisfaction. The real users of errors are twofold: | Audience | Goal | Needs | |----------|------|-------| | Machines | Automated recovery | Flat, kind‑based, predictable codes | | Humans | Debugging | Rich context, call‑path, business information | Existing patterns fail both: they’re either too deep for machines or too shallow for humans. ## A Dual‑Audience Error Design ### Structured, Actionable Errors for Machines Borrowing from Apache OpenDAL’s design, we can use a flat struct that categorises errors by *what the caller can do* rather than *where they originated*: ```rust pub struct Error { kind: ErrorKind, message: String, status: ErrorStatus, operation: &'static str, context: Vec<(&'static str, String)>, source: Option, } pub enum ErrorKind { NotFound, PermissionDenied, RateLimited, // ... } pub enum ErrorStatus { Permanent, // Do not retry Temporary, // Safe to retry Persistent, // Retried, still failing } ``` Callers can now write concise decision logic: ```rust match err { e if e.kind() == ErrorKind::RateLimited && e.is_temporary() => { sleep(Duration::from_secs(1)).await; retry().await } e if e.kind() == ErrorKind::NotFound => { create_default().await }, _ => Err(err), } ``` ### Context‑Rich Errors for Humans For debugging, context should be collected automatically and at the boundaries where modules interact. The tiny `exn` library demonstrates a tree‑structured frame approach: ```rust pub async fn execute(task: Task) -> Result { let make_error = || ExecutorError("failed to execute task {}".into()); let user = self.fetch_user(task.user_id) .await .or_raise(make_error.clone())?; let result = self.process(user) .await .or_raise(make_error)?; Ok(result) } ``` `or_raise` forces context at *module boundaries* because the compiler won’t allow you to drop into a lower‑level error type. If you forget to provide context, the code simply won’t compile. When an error surfaces at night, the report looks like: ``` failed to execute task 7829, at src/executor.rs:45:12 ||-> failed to fetch user "John Doe", at src/executor.rs:52:10 ||-> connection refused, at src/client.rs:89:24 ``` A human can immediately see: task ID, the user request, and the underlying network failure. ## Putting It All Together In practice, you combine the two models: 1. **Machine‑oriented base** – a flat `Error` struct with kind, status, and minimal fields. 2. **Human‑oriented wrapper** – `Exn` that tracks context trees and enforces adding context at boundaries. Propagation looks like: ```rust pub async fn save_document(doc: Document) -> Result<(), Exn> { let data = serialize(&doc) .or_raise(|| StorageError::temp("serialization failed"))?; storage.write(&doc.path, data) .await .or_raise(|| StorageError::perm("write failed"))?; Ok(()) } ``` At a caller boundary you walk the tree to extract the typed `StorageError` for machine‑level handling, then log the full context for human debugging: ```rust match save_document(doc).await { Ok(_) => Ok(()), Err(report) => { log::error!("{:?}", report); // human context if let Some(err) = find_error::(&report) { if err.status == ErrorStatus::Temporary { return queue_for_retry(report); } return Err(map_to_http_status(err.kind())); } Err(StatusCode::INTERNAL_SERVER_ERROR) } } ``` ### Why This Works * **Machines** get a concise, actionable kind & status to decide retries or failure modes. * **Humans** receive a fully captured call‑path with file/line/column stamps without incurring backtrace costs. * **Compilation** enforces context at boundaries, reducing silent losses. ## Conclusion In Rust, errors should be *messages* that convey intent, not just failure markers. Design error types around *what the caller should do* rather than the call stack, and couple them with an ergonomically‑easy context layer that forces meaningful annotations. Stop forwarding errors—start designing them. ## Resources * OpenDAL Error Design RFC * OpenDAL’s Error Handling Practices * `exn`: Context‑aware errors for Rust * “Error Handling in Large Rust Projects” (GreptimeDB) * A Guide to Error Handling that Just Works * Study of `std::io::Error` * “Error Handling In Rust – A Deep Dive” * Tracking Issue for Provider API * Async Stack Traces Working Group