| Category | Status | Created | Author |
|---|---|---|---|
| Job Agents | Draft | 2026-03-13 | Justin Brooks |
Summary
Add a newargo-workflows job agent type that submits Argo Workflow CRDs to
Kubernetes, templates workflow specs from the dispatch context, and monitors
workflow execution to completion. This enables teams to use Argo Workflows as a
native deployment execution engine within ctrlplane’s promotion lifecycle.
Motivation
Ctrlplane’s job agent model today covers three execution patterns:- ArgoCD — declarative GitOps sync of Kubernetes Applications
- GitHub Actions — CI-triggered workflow dispatch via the GitHub API
- Terraform Cloud — speculative and apply runs via the TFC API
The gap: orchestrated multi-step deployments
Many deployment procedures involve multiple steps that must execute in sequence or in a DAG structure on a Kubernetes cluster:- Database migrations before application rollout
- Canary analysis with traffic splitting, metric collection, and rollback
- Blue/green cutover with health checks between steps
- Infrastructure provisioning (create namespace, install CRDs, deploy app)
- Integration test suites that run against a freshly deployed environment
- Custom scripts (data backfixes, cache warming, feature flag toggling)
- GitHub Actions — The workflow runs outside the cluster, requiring kubeconfig secrets, network access to the cluster API, and manual status reporting back to ctrlplane. Multi-cluster deployments need per-cluster credentials. The workflow has no native access to in-cluster resources.
- ArgoCD sync hooks — ArgoCD supports PreSync/Sync/PostSync hooks, but these are limited to single Jobs or Pods with linear ordering. Complex DAGs, conditional branching, retries with backoff, artifact passing between steps, and parameterized templates are not expressible.
Why not use GitHub Actions for everything?
GitHub Actions can technically orchestrate any deployment, but it operates outside the cluster boundary:- Kubeconfig or service account token stored as GitHub secrets
- Network connectivity from GitHub’s runners to the cluster API
- Per-cluster credential management for multi-cluster deployments
- Manual status reporting back to ctrlplane’s API
Why not extend the ArgoCD agent?
ArgoCD and Argo Workflows are separate projects with different APIs, CRDs, and operational models:- ArgoCD is declarative: you describe a desired state (Application CRD) and ArgoCD continuously reconciles toward it. The agent upserts an Application and verifies it reaches Healthy+Synced.
- Argo Workflows is imperative: you submit a workflow (Workflow CRD) and it runs to completion. Each submission is a discrete execution with a start and end.
UpsertApplication
→ poll health model does not map to Argo Workflows’ submit → watch completion
model. Combining them in one agent would conflate two distinct execution
semantics behind a single argo-cd type, making configuration confusing and
error handling ambiguous.
Proposal
Agent type and config
Register a new agent typeargo-workflows in the workspace engine’s job agent
registry. The job agent config provides cluster access and a workflow template:
| Field | Required | Description |
|---|---|---|
serverUrl | Yes | Argo Workflows server URL (API endpoint) |
token | Yes | Bearer token or service account token for authentication |
namespace | No | Default namespace for workflow submission (default: argo) |
template | Yes | Go template rendering an Argo Workflow YAML |
template field follows the same pattern as the ArgoCD agent’s template: a
Go template string that receives the dispatch context and produces a valid Argo
Workflow CRD. This is rendered at dispatch time using the templatefuncs
pipeline with custom delimiters {[ / ]} instead of Go’s default {{ / }}
(see “Template delimiters” below).
Workflow template
The template renders a complete Argo Workflow spec from the dispatch context. The dispatch context provides deployment, environment, resource, version, and variable data — the same data available to all job agent templates. Example template for a database migration + deploy workflow:Template delimiters
Argo Workflows uses{{ / }} for its own parameter substitution at runtime
(e.g., {{workflow.parameters.target-namespace}}). Go’s text/template uses
the same delimiters by default. With standard Go templates, users must escape
every Argo expression — an error-prone and unreadable approach.
Instead, ctrlplane templates use {[ / ]} as delimiters. This is a clean
separation: {[ ]} is ctrlplane’s template language, {{ }} is Argo’s. Both
coexist in the same YAML file without escaping:
templatefuncs package already provides a New function that configures
template options. The custom delimiters are set via Go’s Delims method:
{{ }} expressions currently require
escaping, and custom delimiters eliminate that. Existing templates using {{ /
}} would be migrated to {[ / ]} as part of this change.
Implementation
Go types
Dispatchable implementation
Workflow templating
The template rendering follows the same pattern as ArgoCD but uses{[ / ]}
delimiters and produces an unstructured Kubernetes object instead of a typed
Application CRD. This avoids importing Argo Workflows’ full type system as a
dependency:
Completion polling
After submission, the agent polls the Argo Workflows API for workflow status. The polling follows an exponential backoff pattern capped at 30 seconds:Verification
The agent implementsVerifiable to provide a health check that monitors the
Workflow CRD’s status. Unlike ArgoCD’s continuous health check (which polls
indefinitely because ArgoCD applications are long-lived), the Argo Workflows
verification checks that the workflow reaches a terminal state:
Cancellation
When ctrlplane cancels a job, the agent should stop the Argo Workflow. TheWorkflowSubmitter interface includes a stop method:
PUT /api/v1/workflows/{namespace}/{name}/stop endpoint, which terminates
running nodes and marks the workflow as failed. This integrates with the
existing job cancellation flow — when the reconciler transitions a job to
cancelled, the agent’s poll loop detects this and calls StopWorkflow.
Registry registration
The agent is registered alongside the existing agents in the job dispatch controller:API communication
The agent communicates with Argo Workflows via its REST API rather than Kubernetes client-go. This matches the pattern established by the ArgoCD agent (which uses the ArgoCD API, not the Kubernetes API) and avoids requiring in-cluster access from the workspace engine:| Operation | Method | Endpoint |
|---|---|---|
| Submit workflow | POST | /api/v1/workflows/{namespace} |
| Get status | GET | /api/v1/workflows/{namespace}/{name} |
| Stop workflow | PUT | /api/v1/workflows/{namespace}/{name}/stop |
| Get logs | GET | /api/v1/workflows/{namespace}/{name}/log |
HTTPWorkflowSubmitter implementation makes standard HTTP calls with the
bearer token:
TRPC and UI integration
Job agent config type
Add theargo-workflows type to the job agent config discriminated union in the
TRPC router:
Deployment configuration
In the Terraform provider and CLI:Plannable implementation (RFC 0002 integration)
The Argo Workflows agent can optionally implementPlannable by performing a
dry-run submission. Argo Workflows supports --dry-run and --server-dry-run
flags that validate and render the workflow without executing it. The rendered
output includes the fully resolved template with all parameter substitutions
applied:
Examples
Database migration + application deploy
A deployment uses Argo Workflows to run a database migration before updating the application. The workflow has a DAG structure: migrate → deploy → smoke test.v2.3.1 is deployed to the us-east-1-cluster resource in the
production environment:
- Ctrlplane creates a job and dispatches to the
argo-workflowsagent. - The agent renders the template with the dispatch context (version tag
v2.3.1, resource config, environment, variables). - The rendered Workflow CRD is submitted to Argo Workflows at
https://argo.internal.example.com. - Argo Workflows executes: migrate → deploy → smoke test.
- The agent polls workflow status. On
Succeeded, it marks the ctrlplane job as successful. - Ctrlplane’s promotion lifecycle advances to the next resource.
Canary deployment with traffic splitting
A more complex workflow that performs a canary rollout with metric-based validation:Lifecycle bracket hooks (RFC 0003 integration)
Argo Workflows is well-suited for the lifecycle hook deployments described in RFC 0003. A “drain” deployment can use an Argo Workflow that runskubectl drain with proper PDB handling and timeout logic:
Migration
- No schema changes required. The
job_agenttable already supports arbitrary agent types via thetypecolumn. - The new agent type is registered in the workspace engine’s controller. No changes to the API, reconciler, or promotion lifecycle.
- The
argo-workflowstype is added to the TRPC job agent config union. This is an additive change — existing types are unaffected. - No dependency on the Argo Workflows Go SDK. The agent uses the REST API via
standard
net/httpandk8s.io/apimachinery/pkg/apis/meta/v1/unstructuredfor CRD manipulation. - Existing deployments using ArgoCD are unaffected. The
argo-cdandargo-workflowstypes are fully independent. - Template delimiter migration. Changing from
{{ }}to{[ ]}delimiters affects all existing job agent templates. Existing ArgoCD and GitHub Actions deployment configs that use{{ }}must be updated to{[ ]}. This can be done in two phases: (1) add{[ ]}support alongside{{ }}with auto-detection of which delimiter style is present, (2) deprecate{{ }}after a migration window. The auto-detection checks whether the template contains{[— if so, use{[ ]}delimiters; otherwise fall back to{{ }}.
Open Questions
- Authentication model. The proposal uses a bearer token for Argo Workflows API access. In production, teams often use Kubernetes service account tokens with RBAC, OIDC tokens via Dex, or SSO. Should the agent support multiple auth methods (bearer token, kubeconfig, OIDC client credentials), or is bearer token sufficient as the initial implementation with others added later?
-
Workflow cleanup. Argo Workflows supports TTL-based cleanup
(
ttlStrategy) on the Workflow CRD. Should ctrlplane set a default TTL on submitted workflows to prevent accumulation, or leave this to the user’s template? A sensible default (e.g., 24 hours) prevents resource leaks but may surprise users who want to inspect completed workflows. - Log streaming. The Argo Workflows API supports streaming logs from individual workflow nodes. Should the agent capture and surface step-level logs in ctrlplane’s job detail view, or is linking to the Argo Workflows UI sufficient? Log streaming adds complexity but improves observability for teams that don’t have direct access to the Argo Workflows dashboard.
-
Restorable semantics. The workspace engine supports
Restorableagents that can re-establish in-flight jobs after a process restart. The Argo Workflows agent should implement this by querying workflow status on restore and resuming the poll loop. Should the initial implementation include restore support, or is it acceptable to mark orphaned jobs as failed on restart? -
Template validation. The ArgoCD agent validates templates at dispatch
time. Should the Argo Workflows agent provide pre-submission validation
(e.g., via Argo’s
--dry-runAPI) to catch template errors before submission, or is post-submission error handling sufficient? -
Cluster-scoped vs namespaced Workflows. Argo Workflows supports both
Workflow(namespaced) andClusterWorkflowTemplate(cluster-scoped) resources. The proposal only supportsWorkflow. ShouldWorkflowTemplatereferences be supported, where the template field specifies aworkflowTemplateRefinstead of inline specs? This would enable reuse of pre-defined cluster workflow templates. -
Template delimiter migration scope. The
{[/]}delimiter change benefits all agents but requires migrating every existing template. Should this be scoped to the Argo Workflows agent only (using a per-agent delimiter config), or applied globally across all agents? A global change is cleaner long-term but has a larger migration surface. The auto-detection fallback mitigates breakage, but dual-delimiter support adds complexity to the template engine.