Retry

Retry rules configure how Ctrlplane handles failed jobs. You can control the number of retry attempts, which failure types trigger retries, and the backoff strategy between attempts.

Overview

Why Use Retry Rules?

Retry rules help you:

Handle transient failures - Automatically recover from temporary issues
Reduce manual intervention - Let the system retry before alerting
Configure per-environment - More retries in dev, fewer in production
Control retry behavior - Set backoff strategies to avoid thundering herd

Configuration

curl -X POST https://api.ctrlplane.com/v1/workspaces/{workspaceId}/policies \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Retry on Failure",
    "selector": "environment.name == '\''production'\''",
    "rules": [
      {
        "retry": {
          "maxRetries": 3
        }
      }
    ]
  }'

Retry rules are not yet supported in the Terraform provider. Use the REST API to configure retry behavior.

Properties

retry.maxRetries

integer

required

Maximum retry attempts. 0 means no retries (1 attempt total), 3 means up to 4 attempts (1 initial + 3 retries).

retry.retryOnStatuses

array

Job statuses that trigger a retry. Defaults to ["failure", "invalidIntegration", "invalidJobAgent"] when maxRetries > 0. When maxRetries = 0, also includes "successful" to enforce deploy-once semantics.

retry.backoffSeconds

integer

default:"0"

Seconds to wait between retry attempts. If not set, retries are allowed immediately after job completion.

retry.backoffStrategy

string

default:"linear"

Backoff strategy: linear (constant delay) or exponential (doubling delay with each retry using backoffSeconds * 2^(attempt-1)).

retry.maxBackoffSeconds

integer

Maximum backoff cap in seconds (for exponential backoff). If not set, no maximum is enforced.

Job Statuses

The following job statuses can be used in retryOnStatuses:

Status	Description
`failure`	Job failed during execution
`successful`	Job completed successfully
`cancelled`	Job was manually cancelled
`skipped`	Job was skipped
`invalidIntegration`	Integration configuration error
`invalidJobAgent`	Job agent configuration error

Cancelled and skipped jobs never count toward the retry limit by default, allowing redeployment after manual cancellation.

Common Patterns

Basic Retry

Retry failed jobs up to 3 times:

{
  "retry": {
    "maxRetries": 3
  }
}

Retry with Backoff

Wait between retry attempts:

{
  "retry": {
    "maxRetries": 3,
    "backoffSeconds": 30
  }
}

Exponential Backoff

Increase wait time with each retry:

{
  "retry": {
    "maxRetries": 5,
    "backoffSeconds": 10,
    "backoffStrategy": "exponential",
    "maxBackoffSeconds": 300
  }
}

With exponential backoff, wait times are: 10s → 20s → 40s → 80s → 160s (capped at 300s)

No Retries (Deploy-Once)

Disable retries for critical deployments. When maxRetries is 0, the default retryOnStatuses also includes "successful", enforcing deploy-once semantics:

{
  "name": "No Retry Production",
  "selector": "environment.name == 'production'",
  "rules": [
    {
      "retry": { "maxRetries": 0 }
    }
  ]
}

Retry Specific Statuses

Only retry on specific failure types:

{
  "retry": {
    "maxRetries": 3,
    "retryOnStatuses": ["failure", "invalidIntegration"],
    "backoffSeconds": 60
  }
}

Environment-Specific Retry

Different retry behavior per environment:

[
  {
    "name": "Dev Retry",
    "selector": "environment.name == 'development'",
    "rules": [
      { "retry": { "maxRetries": 5, "backoffSeconds": 5 } }
    ]
  },
  {
    "name": "Staging Retry",
    "selector": "environment.name == 'staging'",
    "rules": [
      { "retry": { "maxRetries": 3, "backoffSeconds": 30 } }
    ]
  },
  {
    "name": "Production Retry",
    "selector": "environment.name == 'production'",
    "rules": [
      {
        "retry": {
          "maxRetries": 2,
          "backoffSeconds": 60,
          "backoffStrategy": "exponential"
        }
      }
    ]
  }
]

Backoff Strategies

Linear Backoff

Constant wait time between retries:

Attempt 1: immediate
Attempt 2: wait 30s
Attempt 3: wait 30s
Attempt 4: wait 30s

Exponential Backoff

Doubling wait time with each retry:

Attempt 1: immediate
Attempt 2: wait 10s  (10 * 2^0)
Attempt 3: wait 20s  (10 * 2^1)
Attempt 4: wait 40s  (10 * 2^2)
Attempt 5: wait 80s  (10 * 2^3)

Use maxBackoffSeconds to cap the maximum wait time.

Retry Lifecycle

1. Job Fails

A job completes with a status in retryOnStatuses.

2. Retry Check

Ctrlplane checks if retries remain (attempt < maxRetries + 1).

3. Backoff Wait

If backoffSeconds is configured, Ctrlplane waits before the next attempt. The nextEvaluationTime is set to indicate when the retry will be allowed.

4. Retry Attempt

A new job is created for the retry attempt.

5. Success or Exhausted

The process continues until success or all retries are exhausted.

Best Practices

Retry Guidelines

Scenario	Max Retries	Backoff	Strategy
Transient network	3-5	10-30s	exponential
Rate limiting	3	60s	exponential
Resource contention	2-3	30s	linear
Critical production	1-2	60s	linear
Flaky tests (dev/qa)	5	5s	linear

Recommendations

✅ Use exponential backoff for external service failures
✅ Set maxBackoffSeconds to avoid excessive wait times
✅ Use fewer retries in production than in development
✅ Monitor retry rates to identify systemic issues
✅ Combine with alerting on final failure

Anti-Patterns

❌ Infinite retries (always set maxRetries)
❌ No backoff for rate-limited APIs
❌ Same retry config across all environments
❌ Retrying on non-transient failures

Next Steps

Policies Overview - Learn about policy structure
Environment Progression - Control promotion flow
Version Cooldown - Batch frequent releases

Deployment

Job Agents

Policies

Verification

Overview

Why Use Retry Rules?

Configuration

Properties

Job Statuses

Common Patterns

Basic Retry

Retry with Backoff

Exponential Backoff

No Retries (Deploy-Once)

Retry Specific Statuses

Environment-Specific Retry

Backoff Strategies

Linear Backoff

Exponential Backoff

Retry Lifecycle

1. Job Fails

2. Retry Check

3. Backoff Wait

4. Retry Attempt

5. Success or Exhausted

Best Practices

Retry Guidelines

Recommendations

Anti-Patterns

Next Steps

Deployment

Job Agents

Policies

Verification

​Overview

​Why Use Retry Rules?

​Configuration

​Properties

​Job Statuses

​Common Patterns

​Basic Retry

​Retry with Backoff

​Exponential Backoff

​No Retries (Deploy-Once)

​Retry Specific Statuses

​Environment-Specific Retry

​Backoff Strategies

​Linear Backoff

​Exponential Backoff

​Retry Lifecycle

​1. Job Fails

​2. Retry Check

​3. Backoff Wait

​4. Retry Attempt

​5. Success or Exhausted

​Best Practices

​Retry Guidelines

​Recommendations

​Anti-Patterns

​Next Steps

Overview

Why Use Retry Rules?

Configuration

Properties

Job Statuses

Common Patterns

Basic Retry

Retry with Backoff

Exponential Backoff

No Retries (Deploy-Once)

Retry Specific Statuses

Environment-Specific Retry

Backoff Strategies

Linear Backoff

Exponential Backoff

Retry Lifecycle

1. Job Fails

2. Retry Check

3. Backoff Wait

4. Retry Attempt

5. Success or Exhausted

Best Practices

Retry Guidelines

Recommendations

Anti-Patterns

Next Steps