Retry rules configure how Ctrlplane handles failed jobs. You can control the number of retry attempts, which failure types trigger retries, and the backoff strategy between attempts.Documentation Index
Fetch the complete documentation index at: https://docs.ctrlplane.dev/llms.txt
Use this file to discover all available pages before exploring further.
Overview
Why Use Retry Rules?
Retry rules help you:- Handle transient failures - Automatically recover from temporary issues
- Reduce manual intervention - Let the system retry before alerting
- Configure per-environment - More retries in dev, fewer in production
- Control retry behavior - Set backoff strategies to avoid thundering herd
Configuration
- API
Retry rules are not yet supported in the Terraform provider. Use the REST API
to configure retry behavior.
Properties
Maximum retry attempts.
0 means no retries (1 attempt total), 3 means up
to 4 attempts (1 initial + 3 retries).Job statuses that trigger a retry. Defaults to
["failure", "invalidIntegration", "invalidJobAgent"] when maxRetries > 0. When
maxRetries = 0, also includes "successful" to enforce deploy-once
semantics.Seconds to wait between retry attempts. If not set, retries are allowed
immediately after job completion.
Backoff strategy:
linear (constant delay) or exponential (doubling delay
with each retry using backoffSeconds * 2^(attempt-1)).Maximum backoff cap in seconds (for exponential backoff). If not set, no
maximum is enforced.
Job Statuses
The following job statuses can be used inretryOnStatuses:
| Status | Description |
|---|---|
failure | Job failed during execution |
successful | Job completed successfully |
cancelled | Job was manually cancelled |
skipped | Job was skipped |
invalidIntegration | Integration configuration error |
invalidJobAgent | Job agent configuration error |
Cancelled and skipped jobs never count toward the retry limit by default,
allowing redeployment after manual cancellation.
Common Patterns
Basic Retry
Retry failed jobs up to 3 times:Retry with Backoff
Wait between retry attempts:Exponential Backoff
Increase wait time with each retry:No Retries (Deploy-Once)
Disable retries for critical deployments. WhenmaxRetries is 0, the default
retryOnStatuses also includes "successful", enforcing deploy-once semantics:
Retry Specific Statuses
Only retry on specific failure types:Environment-Specific Retry
Different retry behavior per environment:Backoff Strategies
Linear Backoff
Constant wait time between retries:Exponential Backoff
Doubling wait time with each retry:maxBackoffSeconds to cap the maximum wait time.
Retry Lifecycle
1. Job Fails
A job completes with a status inretryOnStatuses.
2. Retry Check
Ctrlplane checks if retries remain (attempt < maxRetries + 1).
3. Backoff Wait
IfbackoffSeconds is configured, Ctrlplane waits before the next attempt. The
nextEvaluationTime is set to indicate when the retry will be allowed.
4. Retry Attempt
A new job is created for the retry attempt.5. Success or Exhausted
The process continues until success or all retries are exhausted.Best Practices
Retry Guidelines
| Scenario | Max Retries | Backoff | Strategy |
|---|---|---|---|
| Transient network | 3-5 | 10-30s | exponential |
| Rate limiting | 3 | 60s | exponential |
| Resource contention | 2-3 | 30s | linear |
| Critical production | 1-2 | 60s | linear |
| Flaky tests (dev/qa) | 5 | 5s | linear |
Recommendations
- ✅ Use exponential backoff for external service failures
- ✅ Set
maxBackoffSecondsto avoid excessive wait times - ✅ Use fewer retries in production than in development
- ✅ Monitor retry rates to identify systemic issues
- ✅ Combine with alerting on final failure
Anti-Patterns
- ❌ Infinite retries (always set
maxRetries) - ❌ No backoff for rate-limited APIs
- ❌ Same retry config across all environments
- ❌ Retrying on non-transient failures
Next Steps
- Policies Overview - Learn about policy structure
- Environment Progression - Control promotion flow
- Version Cooldown - Batch frequent releases