Error Handling
Steps can fail. An HTTP request times out, an LLM returns an error, a permission is denied, a called machine throws an exception. By default, a failure halts the machine and returns an error. The on failure block lets you catch failures and recover: provide a fallback, log the error, notify someone, or retry with a different strategy.
on failure basics
Place on failure at the end of a flow. If any step in that flow fails, execution jumps to the on failure block:
implements ask fetch_weather, from: "@mashin/actions/http/get" url: "https://api.weather.com/current?city=${input.city}" assuming body: {temp: 72, conditions: "sunny"} status: 200
compute format {temperature: steps.fetch_weather.body.temp, source: "live"}
on failure compute fallback { temperature: null, source: "unavailable", error: error.message }If the HTTP request fails (timeout, 500 error, DNS failure), the on failure block runs. The machine returns the fallback response instead of an error.
Error context
Inside on failure, the error is available through:
| Reference | Description |
|---|---|
error.message | Human-readable error description |
error.step | Name of the step that failed |
error.type | Error category: "timeout", "permission_denied", "runtime" |
error.details | Additional error-specific data |
on failure compute error_report { failed_step: error.step, error_type: error.type, message: error.message, status: "failed" }Fallback to a different model
A common pattern: try a primary model, fall back to a smaller one if it fails.
machine resilient_classifier
accepts text as text, is required
responds with category as text model_used as text
ensures permissions allowed to llm_call
implements flow classify ask primary, using: "anthropic:claude-sonnet-4-6" with task "Classify this text.\n\nText: ${input.text}" returns category as text assuming category: "general"
compute result {category: steps.primary.category, model_used: "primary"}
on failure ask fallback, using: "anthropic:claude-haiku-4-5" with task "Classify this text into a category.\n\nText: ${input.text}" returns category as text assuming category: "general"
compute result {category: steps.fallback.category, model_used: "fallback"}Notify on failure
Fire off an alert when something goes wrong, then return a graceful error:
on failure launch send_alert machine: "@mashin/actions/notifications/send" channel: "slack" message: "Pipeline failed at " + error.step + ": " + error.message
compute error_response {status: "failed", step: error.step, message: error.message}launch is fire-and-forget: the alert is sent asynchronously and does not block the error response.
Per-flow error handling
In multi-flow machines, each flow has its own on failure:
implements flows flow extract ask fetch, from: "@mashin/actions/http/get" url: input.url assuming body: {} status: 200 on failure compute extract_error {stage: "extract", error: error.message}
flow transform compute process {data: steps.fetch.body} on failure compute transform_error {stage: "transform", error: error.message}A failure in extract does not trigger transform’s error handler. Each flow manages its own recovery.
Runtime retry configuration
For automatic retry of transient failures, use runtime configuration:
implements runtime max_retries: 3 timeout: 30000This retries any failed step up to 3 times before triggering on failure. Use this for transient errors (network timeouts, rate limits) rather than permanent failures (invalid input, permission denied).
What gets recorded
Both the failure and the recovery are recorded in the behavioral ledger:
- The original step failure (step name, error type, message)
- Each step in the
on failureblock (with normal step recording) - The final output of the recovery path
This makes error-and-recovery sequences fully auditable. You can answer: “What failed? What did the machine do about it?”
If the on failure block itself fails, the machine halts with the secondary error. There is no nested error handling.
Try it
Build a machine that calls an HTTP API. Add an on failure block that returns cached data when the API is unavailable. Use error.type to distinguish between timeouts and other failures, returning different messages for each.
Next steps
- Flows - Multi-flow architecture
- Governance - Permission denials as failures
- on failure reference - Full specification