Retry Vs Release

Retry and release both make a run continue later, but they mean different things.

Retry

Retry means the task failed transiently and should attempt again. It is failure semantics.

Use retry for infrastructure or dependency failures such as timeouts, deadlocks, rate limits, and temporary provider outages. Retries increment failure and retry counters. Operators should see retry pressure because it indicates work is failing before eventually succeeding or exhausting its budget.

Retry policy controls the retry budget and backoff timing. maxAttempts counts total attempts, including the first execution. If a retryable failure happens before the budget is exhausted, core records run.retry_scheduled, persists the sanitized RunFailure, increments failure and retry counters, clears the lease, sets status to retrying, and stores the next runAt. When the retry budget is exhausted, core records run.failed and the run becomes terminally failed.

Failure records use stable ErrorCode values.

Failure source	Durable code behavior
Unknown user-code throw	Becomes `ErrorCode.TaskFailed` with the generic message `Task failed.`
Structured `RunlaneError`	Keeps its code while core maps the message to task-facing text.
Provider or business detail	Belongs in failure `meta`; keep `code` reserved for Runlane-owned classification.

import { RetryBackoffType, task } from '@runlane/core'

const sendEmail = task({
  id: 'emails.send',
  retry: {
    maxAttempts: 3,
    backoff: { type: RetryBackoffType.Exponential, delay: '30s', maxDelay: '5m' },
  },
  schema,
  async run(payload) {
    await emailProvider.send(payload)
  },
})

An omitted backoff uses contractDefaults.retry.backoff. A string backoff is fixed. Exponential backoff doubles from the base delay for each retry and respects maxDelay when present.

Retry options:

Option	Required	Default	What it controls
`maxAttempts`	Yes	None	Total attempt budget, including the first execution.
`backoff`	No	`contractDefaults.retry.backoff` (`30s`)	Delay before the next retry. A string means fixed backoff.
`backoff.type`	When `backoff` is an object	None	`RetryBackoffType.Fixed` or `RetryBackoffType.Exponential`.
`backoff.delay`	When `backoff` is an object	None	Base delay as a Runlane duration string.
`backoff.maxDelay`	No	No cap	Maximum delay for exponential backoff. Must be greater than or equal to `delay`.

Retry outcomes are durable attempt outcomes:

Attempt outcome	Status after the attempt	Counter changes	Event
Retryable failure with budget left	`retrying`	`attempts + 1`, `failures + 1`, `retries + 1`	`run.retry_scheduled`
Retryable failure with no budget left	`failed`	`attempts + 1`, `failures + 1`	`run.failed`
Non-retryable structured failure	`failed`	`attempts + 1`, `failures + 1`	`run.failed`

runNow() follows the same outcome model. It executes exactly one inline attempt, then returns the persisted run. If that attempt schedules a retry, runNow() returns a retrying run; it does not sleep and execute the retry inline.

Release

Release means the task is not done yet, but nothing failed. It is business waiting.

Use release when the next correct action is to wait for external state: a report is still processing, a payment is pending, an invoice is not ready, another run must finish first, or a provider tells you to check again later.

Releases increment release counters, not failure counters. Core records run.released, clears the lease, sets status to released, and stores a durable wait condition. A time release stores runAt; signal and run-completion releases wait for the matching event and may also carry a timeout fallback.

import { task } from '@runlane/core'

const pollReport = task({
  id: 'reports.poll',
  schema,
  async run(payload, context) {
    const report = await reports.get(payload.reportId)

    if (report.status === 'processing') {
      return context.release('5m', { reason: 'provider_not_ready' })
    }

    await storeReport(report)
  },
})

If the first polling attempt starts a one-time external operation, wrap that submission in context.step.run() and then release while the provider is still working. Release controls when the run wakes up again; the durable step prevents the submission callback from running again after a retry, release, or crash. See Durable Steps.

context.release(delay, options) returns a TaskReleaseType.Release result. Task handlers must return that result to release the run. Core reserves type: 'task_release' as the release discriminant, so successful output objects must not use that exact type value.

context.release(delay, options) fields:

Value	Required	Default	What it records
`delay`	Yes	None	Runlane duration string until the released run becomes due again.
`options.reason`	No	None	Short public reason for operator views, such as `provider_not_ready`.
`options.meta`	No	None	JSON object with structured domain detail.

Handlers can also release until a signal key is sent or another run reaches terminal state:

const waitForWebhook = task({
  id: 'reports.wait_for_webhook',
  schema,
  async run(payload, context) {
    if (!(await reportIsReady(payload.reportId))) {
      return context.waitForSignal(`report_ready_${payload.reportId}`, {
        reason: 'provider_webhook_pending',
        timeout: '1h',
      })
    }
  },
})

context.waitForSignal(signalKey, options) and context.waitForRun(runId, options) return the same TaskReleaseType.Release result. options.reason and options.meta are recorded for operators. options.timeout is optional; when supplied, the released run is also eligible for normal maintenance delivery after that time if the signal or source run completion never arrives.

A wait timeout is not an attempt timeout. It does not record ErrorCode.TaskTimedOut, does not increment failure counters, and does not consume retry budget. It only gives the released run a fallback wakeup time. When the fallback fires, the next attempt reruns the handler against the same payload and current application state.

Applications resume signal waits with runlane.signals.send(signalKey, { limit? }). Run-completion waits resume automatically when the source run reaches a terminal state through the same runtime; if the source run is already terminal when the waiter releases, core appends the resume delivery request with the release. Duplicate signal sends are safe: once the first delivery request clears the wait condition, later sends do not target that run again.

Signal Resume API

Use signal waits when an external fact decides when the run should continue: a webhook arrives, a human approves work, a provider callback completes, or an application event says a resource is ready. The signal key names that fact. It is not a pub/sub topic and it does not carry a payload.

const waitForReport = task({
  id: 'reports.wait',
  schema,
  async run(payload, context) {
    if (!(await reportIsReady(payload.reportId))) {
      return context.waitForSignal(`report_ready_${payload.reportId}`, {
        reason: 'provider_webhook_pending',
        timeout: '1h',
      })
    }

    await storeCompletedReport(payload.reportId)
  },
})

async function handleReportWebhook(request: Request) {
  const webhook = await request.json()

  await persistWebhookResult(webhook)

  const resumedRuns = await runlane.signals.send(`report_ready_${webhook.reportId}`)

  return Response.json({ resumed: resumedRuns.length })
}

runlane.signals.send(signalKey, options):

Field	Required	Meaning
`signalKey`	Yes	Runlane id value for the external fact. It must be a non-empty string and must not contain `:`.
`options.limit`	No	Maximum matching released runs to resume in one scan. Defaults to the runtime maintenance delivery-request limit.
return value	N/A	`Promise<readonly RunRecord[]>` for runs where core durably appended a resume delivery request. These runs have not executed yet.

send() appends run.delivery_requested to each still-released run waiting on that signal. Projection clears run.wait, sets the run back to queued, and records the delivery as available now. A worker, delivered-wakeup consumer, or later execution call runs the next attempt through the normal lease path.

Duplicate sends are safe. If the first send already cleared the wait, later sends return no run for that completed wait. If two callers race to send the same signal, event-sequence fencing lets one append win and the loser skips that run.

Signal data belongs in application storage. Persist webhook payloads, approval records, provider job results, or domain facts before calling runlane.signals.send(...); the resumed handler should reread the authoritative application state. Do not encode unbounded provider payloads in the signal key, and do not expect Runlane to pass signal payloads to the handler.

Bounded queues keep their capacity rules. A signal resume records durable intent to continue, but it does not steal capacity from currently active bounded-queue work. The queued run waits for the normal dispatch reservation path.

Signal waits can have timeout fallbacks. If the signal arrives before the timeout, the signal clears the wait and the timeout later has nothing to resume. If the signal never arrives, maintenance can request delivery after the timeout so the handler can decide whether to keep waiting, succeed, fail, or retry.

Release outcomes are durable attempt outcomes:

Attempt outcome	Status after the attempt	Counter changes	Event
Handler returns a release result	`released`	`attempts + 1`, `releases + 1`	`run.released`
Handler later completes after resuming	`succeeded`	`attempts + 1` for that resumed attempt	`run.succeeded`

runNow() also executes only one release attempt inline. If the inline handler returns context.release(...), runNow() returns a released run; the later attempt is driven by maintenance and workers.

Maintenance wakeups

Retry and time-based release rely on maintenance after the run is waiting. When a retrying or time-released run's runAt is due, tick() appends run.delivery_requested. Transport-delivery lanes also request a matching outbox row for publication; storage-polling lanes keep the delivery intent in run state for polling workers. A worker or delivered-wakeup handler executes the next attempt.

Signal and run-completion waits are event-driven. runlane.signals.send() and terminal source-run completion append the next durable delivery request directly. That clears run.wait and records the run as queued. For bounded queues, resume does not steal capacity from already active work; the queued run waits for the normal capacity-reserving dispatch path. Timeout fallbacks still rely on tick() once the timeout is due.

tick() does not run user task code. It only materializes schedules, finalizes abandoned cancellations and timed-out attempts, requests delivery for due waiting or expired-lease runs, and flushes due outbox rows. For bounded queues, maintenance first reserves queue capacity through the dispatch path before appending the delivery request.

Why the distinction matters

Using retry for polling makes operator data noisy. It turns normal business waiting into apparent system failure and spends retry budget on expected provider state. Using release for actual failures is also wrong: it hides failure pressure from counters, alerts, and manual recovery paths.

Runlane keeps these outcomes separate so dashboards, alerts, and recovery actions can tell the difference between broken execution and expected continuation.