Overview
Why Use Retry Rules?
Retry rules help you:- Handle transient failures - Automatically recover from temporary issues
- Reduce manual intervention - Let the system retry before alerting
- Configure per-environment - More retries in dev, fewer in production
- Control retry behavior - Set backoff strategies to avoid thundering herd
Configuration
- API
Retry rules are not yet supported in the Terraform provider. Use the REST API
to configure retry behavior.
Properties
Maximum retry attempts.
0 means no retries (1 attempt total), 3 means up
to 4 attempts (1 initial + 3 retries).Job statuses that trigger a retry. Defaults to
["failure", "invalidIntegration", "invalidJobAgent"] when maxRetries > 0. When
maxRetries = 0, also includes "successful" to enforce deploy-once
semantics.Seconds to wait between retry attempts. If not set, retries are allowed
immediately after job completion.
Backoff strategy:
linear (constant delay) or exponential (doubling delay
with each retry using backoffSeconds * 2^(attempt-1)).Maximum backoff cap in seconds (for exponential backoff). If not set, no
maximum is enforced.
Job Statuses
The following job statuses can be used inretryOnStatuses:
| Status | Description |
|---|---|
failure | Job failed during execution |
successful | Job completed successfully |
cancelled | Job was manually cancelled |
skipped | Job was skipped |
invalidIntegration | Integration configuration error |
invalidJobAgent | Job agent configuration error |
Cancelled and skipped jobs never count toward the retry limit by default,
allowing redeployment after manual cancellation.
Common Patterns
Basic Retry
Retry failed jobs up to 3 times:Retry with Backoff
Wait between retry attempts:Exponential Backoff
Increase wait time with each retry:No Retries (Deploy-Once)
Disable retries for critical deployments. WhenmaxRetries is 0, the default
retryOnStatuses also includes "successful", enforcing deploy-once semantics:
Retry Specific Statuses
Only retry on specific failure types:Environment-Specific Retry
Different retry behavior per environment:Backoff Strategies
Linear Backoff
Constant wait time between retries:Exponential Backoff
Doubling wait time with each retry:maxBackoffSeconds to cap the maximum wait time.
Retry Lifecycle
1. Job Fails
A job completes with a status inretryOnStatuses.
2. Retry Check
Ctrlplane checks if retries remain (attempt < maxRetries + 1).
3. Backoff Wait
IfbackoffSeconds is configured, Ctrlplane waits before the next attempt. The
nextEvaluationTime is set to indicate when the retry will be allowed.
4. Retry Attempt
A new job is created for the retry attempt.5. Success or Exhausted
The process continues until success or all retries are exhausted.Best Practices
Retry Guidelines
| Scenario | Max Retries | Backoff | Strategy |
|---|---|---|---|
| Transient network | 3-5 | 10-30s | exponential |
| Rate limiting | 3 | 60s | exponential |
| Resource contention | 2-3 | 30s | linear |
| Critical production | 1-2 | 60s | linear |
| Flaky tests (dev/qa) | 5 | 5s | linear |
Recommendations
- ✅ Use exponential backoff for external service failures
- ✅ Set
maxBackoffSecondsto avoid excessive wait times - ✅ Use fewer retries in production than in development
- ✅ Monitor retry rates to identify systemic issues
- ✅ Combine with alerting on final failure
Anti-Patterns
- ❌ Infinite retries (always set
maxRetries) - ❌ No backoff for rate-limited APIs
- ❌ Same retry config across all environments
- ❌ Retrying on non-transient failures
Next Steps
- Policies Overview - Learn about policy structure
- Environment Progression - Control promotion flow
- Version Cooldown - Batch frequent releases