Retry Vs Release
Retry is failure pressure; release is business waiting.
Retry and release both make a run continue later, but they mean different things.
Retry
Retry means the task failed transiently and should attempt again. It is failure semantics.
Use retry for infrastructure or dependency failures such as timeouts, deadlocks, rate limits, and temporary provider outages. Retries increment failure and retry counters. Operators should see retry pressure because it indicates work is failing before eventually succeeding or exhausting its budget.
Retry policy controls the retry budget and backoff timing. maxAttempts counts total attempts, including the first execution. If a retryable failure happens before the budget is exhausted, core records run.retry_scheduled, persists the sanitized RunFailure, increments failure and retry counters, clears the lease, sets status to retrying, and stores the next runAt. When the retry budget is exhausted, core records run.failed and the run becomes terminally failed.
Failure records use stable ErrorCode values.
| Failure source | Durable code behavior |
|---|---|
| Unknown user-code throw | Becomes ErrorCode.TaskFailed with the generic message Task failed. |
Structured RunlaneError | Keeps its code while core maps the message to task-facing text. |
| Provider or business detail | Belongs in failure meta; keep code reserved for Runlane-owned classification. |
import { RetryBackoffType, task } from '@runlane/core'
const sendEmail = task({
id: 'emails.send',
retry: {
maxAttempts: 3,
backoff: { type: RetryBackoffType.Exponential, delay: '30s', maxDelay: '5m' },
},
schema,
async run(payload) {
await emailProvider.send(payload)
},
})An omitted backoff uses contractDefaults.retry.backoff. A string backoff is fixed. Exponential backoff doubles from the base delay for each retry and respects maxDelay when present.
Retry options:
| Option | Required | Default | What it controls |
|---|---|---|---|
maxAttempts | Yes | None | Total attempt budget, including the first execution. |
backoff | No | contractDefaults.retry.backoff (30s) | Delay before the next retry. A string means fixed backoff. |
backoff.type | When backoff is an object | None | RetryBackoffType.Fixed or RetryBackoffType.Exponential. |
backoff.delay | When backoff is an object | None | Base delay as a Runlane duration string. |
backoff.maxDelay | No | No cap | Maximum delay for exponential backoff. Must be greater than or equal to delay. |
Retry outcomes are durable attempt outcomes:
| Attempt outcome | Status after the attempt | Counter changes | Event |
|---|---|---|---|
| Retryable failure with budget left | retrying | attempts + 1, failures + 1, retries + 1 | run.retry_scheduled |
| Retryable failure with no budget left | failed | attempts + 1, failures + 1 | run.failed |
| Non-retryable structured failure | failed | attempts + 1, failures + 1 | run.failed |
runNow() follows the same outcome model. It executes exactly one inline attempt, then returns the persisted run. If that attempt schedules a retry, runNow() returns a retrying run; it does not sleep and execute the retry inline.
Release
Release means the task is not done yet, but nothing failed. It is business waiting.
Use release when the next correct action is to wait for external state: a report is still processing, a payment is pending, an invoice is not ready, or a provider tells you to check again later.
Releases increment release counters, not failure counters. Core records run.released, clears the lease, sets status to released, and stores the resume time in runAt. Released runs become due again at that resume time.
import { task } from '@runlane/core'
const pollReport = task({
id: 'reports.poll',
schema,
async run(payload, context) {
const report = await reports.get(payload.reportId)
if (report.status === 'processing') {
return context.release('5m', { reason: 'provider_not_ready' })
}
await storeReport(report)
},
})context.release(delay, options) returns a TaskRelease. Task handlers must return that value to release the run. Core validates release-shaped handler results before persisting them; malformed release results fail the attempt with ErrorCode.ConfigurationInvalid instead of silently succeeding.
context.release(delay, options) fields:
| Value | Required | Default | What it records |
|---|---|---|---|
delay | Yes | None | Runlane duration string until the released run becomes due again. |
options.reason | No | None | Short public reason for operator views, such as provider_not_ready. |
options.meta | No | None | JSON object with structured domain detail. |
Release outcomes are durable attempt outcomes:
| Attempt outcome | Status after the attempt | Counter changes | Event |
|---|---|---|---|
Handler returns context.release(...) | released | attempts + 1, releases + 1 | run.released |
| Handler later completes after resuming | succeeded | attempts + 1 for that resumed attempt | run.succeeded |
runNow() also executes only one release attempt inline. If the inline handler returns context.release(...), runNow() returns a released run; the later attempt is driven by maintenance and workers.
Maintenance wakeups
Retry and release both rely on maintenance after the run is waiting. When a retrying or released run's runAt is due, tick() appends run.delivery_requested and storage creates the matching outbox row. Transport publishes the wakeup, and a worker or delivered-wakeup handler executes the next attempt.
tick() does not run user task code. It only materializes schedules, finalizes abandoned cancellations, requests delivery for due waiting or expired-lease runs, and flushes due outbox rows. For bounded queues, maintenance first reserves queue capacity through the dispatch path before appending the delivery request.
Why the distinction matters
Using retry for polling makes operator data noisy. It turns normal business waiting into apparent system failure and spends retry budget on expected provider state. Using release for actual failures is also wrong: it hides failure pressure from counters, alerts, and manual recovery paths.
Runlane keeps these outcomes separate so dashboards, alerts, and recovery actions can tell the difference between broken execution and expected continuation.