| Category | Status | Created | Author |
|---|---|---|---|
| Policies | Draft | 2026-03-13 | Justin Brooks |
Summary
Add an ephemeral plan API that CI pipelines call on pull requests to compute full rendered diffs for each release target — showing exactly what Kubernetes manifests, Terraform resources, or other deployed artifacts would change, liketerraform plan output. Results can optionally be posted back to
GitHub as PR comments or check runs. No version is created; the plan is computed
on the fly and returned to the caller.
Motivation
When a developer opens a pull request that will eventually become a new deployment version, two questions arise before merging:- Which release targets will this version affect?
- What exactly will change on each affected target?
terraform plan provides — before committing to a
deployment.
RFC 0002 introduces the Plannable interface on job agents, which can compute
rendered output without dispatching a job. But RFC 0002 focuses on the
reconciler: plans are computed during the promotion lifecycle to detect no-diff
targets and fast-track them. There is no way to trigger a plan before a
version exists.
The PR workflow gap
The typical CI workflow for ctrlplane today:terraform plan solved this for infrastructure: before applying, you see the
full execution plan with resource-level diffs. The same pattern should exist for
ctrlplane deployments.
What “plan” means in this context
A dry-run plan computes, for each release target, the full rendered output that the external system (ArgoCD, Terraform Cloud, etc.) would produce for the proposed version — then diffs it against the current deployed state. This is not a hash comparison (RFC 0002) or an affected/unaffected classification. It is the actual diff content:- For ArgoCD: the per-resource Kubernetes manifest diff (like
argocd app diff) - For Terraform Cloud: the resource-level before/after diff (like
terraform plan) - For Helm: the rendered template diff (like
helm diff upgrade)
terraform plan output today.
Relationship to prior RFCs
- RFC 0001 (Scoped Versions) — The deployer declares which targets a version
affects. Dry-run plans can inform that decision: review the plan on the PR,
then create the version with a
targetSelectorthat matches only the affected targets. - RFC 0002 (Plan-Based Diff Detection) — Provides the
Plannableinterface and agent implementations that this RFC consumes. RFC 0002 runs plans inside the reconciler; this RFC exposes plans via an API endpoint before any version exists.
Proposal
API
Add a new endpoint that accepts proposed version data and returns rendered diffs per release target. Nothing is persisted — the plan is ephemeral. Endpoint:hasChanges— whether the rendered output differs from the current statediff.raw— human-readable unified diff of the full rendered outputdiff.resources— structured breakdown of per-resource changes withkind,name,namespace,action(add/modify/delete), and a per-resourcediff
GET /v1/workspaces/{workspaceId}/deployments/{deploymentId}/plan/{planId}
until status transitions to completed or failed.
Extended PlanResult type
RFC 0002 defines PlanResult with ContentHash, HasChanges, and a simple
Diff string. The dry-run plan requires richer diff data. The type is extended:
ContentHash and HasChanges. The
additional fields (RenderedOutput, Diff) are populated by agents when called
through the dry-run plan API and ignored by the reconciler path.
How agents produce diffs
ThePlannable interface from RFC 0002 is unchanged — agents return a
PlanResult. The difference is what the caller does with it:
- Reconciler (RFC 0002): Only inspects
ContentHashandHasChanges. - Dry-run plan API (this RFC): Inspects the full
PlanDiffand returns it to the caller.
Diff field.
Agents that only implement hash-based comparison (no diff capability) can still
participate — the API response will show hasChanges: true/false but diff
will be null.
ArgoCD
The ArgoCD agent calls the ArgoCD API to produce a real diff. The in-processTemplateApplication function only renders the Application CRD (which always
differs because targetRevision changes). The actual diff lives in the
Kubernetes manifests that ArgoCD produces after fetching the git repo and
rendering the Helm chart or kustomize overlay.
The Plan implementation uses a temporary Application strategy. Calling
GetManifests on the existing Application only overrides the revision — it does
not pick up changes to Helm values, parameters, kustomize patches, or any other
spec field derived from deployment variables. To get a fully accurate manifest
diff for any kind of change (revision, variables, config), the agent creates a
short-lived Application with auto-sync disabled, waits for ArgoCD to render
manifests for it, fetches those manifests, then cleans it up.
The flow:
- Renders the proposed Application CRD from the dispatch context (same as dispatch time). This CRD reflects all variable and config changes.
- Strips any auto-sync policy and sets the sync policy to manual, so the temporary Application will never deploy to the cluster.
- Creates the temporary Application in ArgoCD with a deterministic plan-scoped
name (e.g.,
<original-name>-plan-<short-hash>). - Waits for ArgoCD to compute the desired manifests for the temporary Application. ArgoCD fetches the git repo, renders Helm/kustomize with the full proposed spec (including new values, parameters, revisions), and populates the manifest cache.
- Calls
GetManifestson the temporary Application to retrieve the fully rendered proposed manifests. - Calls
GetManifestson the original Application to retrieve the current manifests. - Deletes the temporary Application.
- Computes a per-resource unified diff between the two manifest sets.
spec.sources (plural) for
Applications that pull from multiple Git repos or Helm charts (e.g., a chart
from one repo and values from another). Because the temporary Application is
created from the full rendered spec, multi-source applications are handled
naturally — the proposed spec’s sources list (with all target revisions) is
preserved as-is.
waitForManifests helper polls the temporary Application until ArgoCD
reports a non-empty manifest set or the context deadline expires:
defer with
cascade: false (the Application never synced, so there are no cluster
resources to remove). If the agent crashes before cleanup, orphaned Applications
remain in ArgoCD. ArgoCD has no native TTL mechanism for Applications — cleanup
of orphans is ctrlplane’s responsibility.
Every temporary Application is labelled ctrlplane.dev/plan: "true" and
annotated with ctrlplane.dev/plan-created-at: <RFC3339 timestamp>. A
background goroutine in the workspace engine periodically lists Applications
matching the plan label, parses the created-at annotation, and deletes any older
than planTTL (default 10 minutes):
PlanAppGC instance for each configured ArgoCD connection.
The diffManifestSets function parses each manifest as a Kubernetes resource,
matches resources by apiVersion/kind/namespace/name, and produces a
ResourceChange for each:
- Resources in proposed but not current →
action: "add" - Resources in current but not proposed →
action: "delete" - Resources in both with different content →
action: "modify"with unified diff - Resources in both with identical content → omitted (no-op)
Terraform Cloud
Terraform Cloud speculative plans already produce structured diff output. ThePlan implementation triggers a speculative plan run and maps the result:
GitHub Actions / unsupported agents
Agents that do not implementPlannable return nil from the registry’s Plan
method. The dry-run plan endpoint reports these targets as:
Plan execution flow
The plan endpoint does not create a version or trigger the reconciler. It constructs the necessary context in-memory and calls agents directly:- Creates a plan record in a lightweight
deployment_plantable withstatus: "computing". - Enqueues plan computation as background work.
- Returns the plan ID immediately.
- The CI polls
GET .../plan/{planId}until status iscompleted.
expires_at column enables periodic cleanup. No
long-term storage is needed.
GitHub integration
When the plan request includes agithub field, ctrlplane posts results back to
the PR using the GitHub App that is already configured for workflow dispatch:
Request with GitHub integration:
success/neutral/failure and structured annotations per changed resource.
Check runs integrate with branch protection rules, allowing teams to require a
passing plan before merge.
Implementation: The existing GitHub App integration in the workspace engine
uses the ArgoCD Go client pattern for API calls. The PR comment/check run
posting uses the GitHub App’s installation token (the same token acquisition
flow used by GoGitHubWorkflowDispatcher in
apps/workspace-engine/svc/controllers/jobdispatch/jobagents/github/).
Optional: pull_request webhook handler
As a convenience layer, ctrlplane can optionally react to GitHub pull_request
webhook events to auto-trigger plans without CI changes.
The GitHub webhook handler in apps/api/src/routes/github/index.ts currently
only handles workflow_run events:
pull_request handler:
packages/validators/src/github/index.ts
(GithubPullRequestVersion, PullRequestMetadataKey, PullRequestConfigKey)
but are not wired up to any handler.
The handlePullRequestEvent function would:
- Extract the repo owner/name and head SHA from the event payload.
- Find deployments whose job agent config references this repo (by matching
ownerandrepofields in the GitHub job agent config). - For each matching deployment, trigger a plan using the head SHA as the proposed version tag.
- Post results back as a PR comment or check run.
Examples
ArgoCD: Helm chart change on a PR
A deployment manages 20 clusters across 4 environments using ArgoCD with a monorepo Helm chart. A developer opens a PR that modifiescharts/payment/values.yaml.
- Builds a transient version with the PR’s head commit.
- For each of the 20 release targets, calls ArgoCD’s
GetManifestsAPI with the PR commit as the target revision. - Diffs the proposed manifests against the currently deployed manifests.
- Returns: 4 targets show changes (the clusters running the payment service), 16 show no changes.
- Posts a PR comment showing the diff table with expandable per-target diffs.
Terraform Cloud: Infrastructure PR
A deployment manages Terraform infrastructure across 3 regions. A PR changes an IAM policy module.GitHub Actions: Unsupported agent
A deployment uses GitHub Actions (noPlannable implementation). The plan
endpoint still runs but cannot produce diffs:
Migration
- The
deployment_plantable is new and requires no data migration. - Plans are ephemeral with a 1-hour TTL by default. No long-term storage concerns.
- The
Plannableinterface (RFC 0002) is unchanged. Agents that already implement it gain dry-run plan support automatically; they only need to populate theDifffield for rich output. - The
pull_requestwebhook handler is additive. The existingworkflow_runhandler is unchanged. - No changes to the version creation flow, reconciler, or promotion lifecycle.
Open Questions
- Rate limiting. Plans involve external API calls (ArgoCD manifest rendering, Terraform speculative plans). For deployments with many release targets, a single PR could trigger hundreds of external calls. Should there be a per-deployment or per-workspace rate limit on plan requests? Should callers be able to scope the plan to specific environments or resources?
-
Plan scope. The proposal plans against all release targets. For large
deployments, the caller may want to plan only for specific environments or
resources. Should the request body accept an optional filter
(
environmentSelector,resourceSelector) to narrow the plan scope? -
Diff format standardization. ArgoCD produces YAML diffs, Terraform
produces HCL-style diffs. Should the
rawfield inPlanDiffbe agent-specific (each agent returns its native format), or should ctrlplane normalize to a common diff format? - Cost of plans. Each plan consumes Terraform Cloud compute resources. For deployments with many targets across many PRs, this could become expensive. Should Terraform plans require explicit opt-in per deployment?
-
Temporary Application permissions. Creating and deleting Applications
requires write access to the ArgoCD API. Some teams restrict Application
creation to specific ArgoCD projects or RBAC roles. Should the plan
Application be created in a dedicated ArgoCD project (e.g.,
ctrlplane-plans) with limited permissions, or inherit the project from the original Application? - ArgoCD rendering latency. After creating the temporary Application, the agent polls until ArgoCD renders manifests. For large Helm charts or slow git repos this could take significant time. Should there be a configurable timeout per agent, and how should the plan endpoint report rendering timeouts vs. real errors?