Redesigning Error Handling in Rust: From Forwarding to Purposeful Communication
Rustâs current errorâhandling ecosystem often devolves into opaque forwarding, stripping actionable insight. This article surveys the shortcomings of std::error::Error, backtraces, the Provider/Request API, and popular libraries like thiserror and anyhow, then proposes a dualâaudience model that balances machineâreadable, retryâfriendly errors with humanâfriendly, contextârich debugging information.
# Error Forwarding: A Silent Saboteur
Itâs 3âŻam, the servers are down, and a log entry shouts:
```
Error: serialization error: expected ',' or '}' at line 3, column 7
```
You have a stack trace that has bubbled through twenty layers, preserving the message but losing every shred of meaning. The problem is not a single bug; itâs the way we propagate errors. We catch, optionally wrap, and then throw them up the stack fast, treating errors like hot potatoes rather than useful messages.
## The Limitations of Rustâs Standard Abstractions
### std::error::Error â A Chain, Not a Tree
The `Error` trait assumes a linear chain of causes: `error.source() -> Option<&dyn Error>`. Most errors fit this model, but many realâworld failures form trees. A validation step might fail on several fields; a timeout might leave partial results. The standard trait offers no way to model these branching causes.
### Backtraces â Cheap but Misleading
Rustâs `std::backtrace::Backtrace` improves observability, but:
* In async contexts it contains dozens of opaque `GenFuture::poll()` frames, making the logical path opaque.
* It only tells you where an error was created, not the applicationâlevel flow that led there.
* Capturing a backtrace can be a quite expensive runtime operation.
The result is a stack trace that looks like a breadcrumb trail of compilerâgenerated frames, not a story of where the userâs request died.
### Provider/Request API â Overâengineering
RFC 3192 introduces `provide` and `request` to let errors expose arbitrary typed data (HTTP status codes, backtraces, etc.). While flexible, the API is unpredictable (an error may or may not provide a status) and subtle enough that compilers struggle to optimize. In practice it adds unnecessary indirection rather than clarity.
## Popular Libraries â Good Intent, Bad Outcome
### thiserror â OriginâCentric Enums
`thiserror` makes defining error enums painless:
```rust
#[derive(Debug, thiserror::Error)]
pub enum DatabaseError {
#[error("connection failed: {0}")]
Connection(#[from] ConnectionError),
#[error("query failed: {0}")]
Query(#[from] QueryError),
#[error("serialization failed: {0}")]
Serde(#[from] serde_json::Error),
}
```
This enum tells you *where* the failure happened, but not *what to do* about it. A calling layer cannot decide whether to retry, report, or swallow the error without custom logic.
### anyhow â Silly Context
`anyhow` erases types behind `anyhow::Result` and encourages optional `.context()` calls. The compiler never forces you to add context. As a result, chains of `?` propagate shallow, ambiguous errors. Developers often forget to add the rich context that would be invaluable at night shifts.
## The Fundamental Disconnect
Error handling has traditionally focused on type safety and compiler satisfaction. The real users of errors are twofold:
| Audience | Goal | Needs |
|----------|------|-------|
| Machines | Automated recovery | Flat, kindâbased, predictable codes |
| Humans | Debugging | Rich context, callâpath, business information |
Existing patterns fail both: theyâre either too deep for machines or too shallow for humans.
## A DualâAudience Error Design
### Structured, Actionable Errors for Machines
Borrowing from Apache OpenDALâs design, we can use a flat struct that categorises errors by *what the caller can do* rather than *where they originated*:
```rust
pub struct Error {
kind: ErrorKind,
message: String,
status: ErrorStatus,
operation: &'static str,
context: Vec<(&'static str, String)>,
source: Option,
}
pub enum ErrorKind {
NotFound,
PermissionDenied,
RateLimited,
// ...
}
pub enum ErrorStatus {
Permanent, // Do not retry
Temporary, // Safe to retry
Persistent, // Retried, still failing
}
```
Callers can now write concise decision logic:
```rust
match err {
e if e.kind() == ErrorKind::RateLimited && e.is_temporary() => {
sleep(Duration::from_secs(1)).await;
retry().await
}
e if e.kind() == ErrorKind::NotFound => { create_default().await },
_ => Err(err),
}
```
### ContextâRich Errors for Humans
For debugging, context should be collected automatically and at the boundaries where modules interact. The tiny `exn` library demonstrates a treeâstructured frame approach:
```rust
pub async fn execute(task: Task) -> Result