Engine: CSI snapshots and VolSync¶

Status: Alpha Last Updated: 2026-05-30

For storage-centric work — PV-level snapshots and PV-level replication — the operator integrates with the CSI snapshot controller (snapshots) and VolSync (replication). The two engines share this chapter because they are storage-level primitives: neither captures Kubernetes objects, both rely on the underlying StorageClass and a CSI driver, and the choice between them is primarily a function of the required RPO/RTO targets.

This document covers when to pick CSI snapshots versus VolSync, how an Operation materializes the relevant CRDs, a worked example for each, status mapping, and the RPO/RTO trade-offs.

When to use each engine¶

Question	CSI snapshot	VolSync
Capture a single PV at a point in time?	Yes (engine: snapshot)	Indirectly (sync at interval)
Replicate a PV continuously to another cluster or storage?	No	Yes (engine: volsync)
Capture Kubernetes objects (Secrets, ConfigMaps, etc.) along with PV?	No (use Velero)	No (use Velero)
Crash-consistent vs application-consistent?	Crash; app-consistent via quiesce hook	Async replication; consistency follows the engine (Restic, Kopia, rsync, rclone)
Achievable RPO	Equals the snapshot cadence	Minutes (Restic), seconds (rclone over fast storage)
Achievable RTO	Reattach time of the cloned PV	Restore time of the replicated snapshot / repo
Storage requirements	CSI driver with snapshot support, `VolumeSnapshotClass` installed	CSI driver with `VolumeSnapshotClass` (most VolSync flows snapshot first), plus a remote backend
Cross-cluster	No (snapshots live where the PV lives)	Yes (the entire point of VolSync)

In short: CSI snapshots answer "freeze this volume now and let me roll back to it"; VolSync answers "keep this volume mirrored to somewhere else with a continuous lag I can tolerate".

The Velero engine (velero.md) is complementary: Velero captures Kubernetes objects and can drive CSI snapshots underneath. If the work is "back up an application", use Velero; if the work is "I want a volume-level snapshot I can mount as a sibling PVC", use the CSI snapshot engine; if the work is "this PV must be continuously mirrored to a DR location", use VolSync.

How an `Operation` materializes a `VolumeSnapshot`¶

When the reconciler admits an Operation of engine: snapshot, it:

Resolves the target ApplicationInstance and the PVCs it owns (declared via spec.persistence in the operation template, typically a single named PVC such as data-<release>).
Constructs a snapshot.storage.k8s.io/VolumeSnapshot in the target's namespace, with spec.source.persistentVolumeClaimName set to the named PVC and spec.volumeSnapshotClassName set to the requested class.
Sets ownership labels (app.vworkspace.io/managed-by, app.vworkspace.io/cluster-id, ops.vworkspace.io/operation) on the VolumeSnapshot.
Watches the VolumeSnapshot.status.readyToUse and VolumeSnapshot.status.restoreSize and rewrites Operation.status on each transition.

When a quiesce hook is advertised on the ApplicationInstance (ops.vworkspace.io/quiesce: exec), the operator can optionally invoke it before creating the VolumeSnapshot and reverse it after readyToUse=true. The hook is opt-in and is described as part of the operation template's parameters.quiesce block.

Worked example: CSI snapshot¶

A VolumeSnapshot of the Nextcloud data PVC, taken as a pre-migration safety net (also visible inline in the Argo Workflow example in argo-workflows.md):

The `Operation`¶

apiVersion: ops.vworkspace.io/v1alpha1
kind: Operation
metadata:
  name: nextcloud-myteam-snapshot-2026-05-28
  namespace: org-myteam
spec:
  targetRef:
    apiVersion: apps.vworkspace.io/v1alpha1
    kind: ApplicationInstance
    name: nextcloud-myteam
  type: Backup
  engine: snapshot
  parameters:
    pvc: data-nextcloud-myteam
    volumeSnapshotClassName: csi-rbd
    quiesce:
      enabled: true
      timeoutSeconds: 60

The materialized `VolumeSnapshot`¶

apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshot
metadata:
  name: nextcloud-myteam-snapshot-2026-05-28
  namespace: org-myteam
  labels:
    app.vworkspace.io/managed-by: vworkspace-operator
    app.vworkspace.io/cluster-id: cluster-prod-1
    ops.vworkspace.io/operation: 2c4d...
spec:
  volumeSnapshotClassName: csi-rbd
  source:
    persistentVolumeClaimName: data-nextcloud-myteam

The `Operation.status` after completion¶

status:
  phase: Succeeded
  startedAt: "2026-05-28T11:00:00Z"
  finishedAt: "2026-05-28T11:00:14Z"
  conditions:
    - type: Accepted
      status: "True"
      reason: TemplateValidated
    - type: Succeeded
      status: "True"
      reason: VolumeSnapshotReadyToUse
      message: "VolumeSnapshot is ReadyToUse; restoreSize 250Gi"
  outputs:
    volumeSnapshotName: nextcloud-myteam-snapshot-2026-05-28
    volumeSnapshotContentName: snapcontent-...
    restoreSize: "250Gi"

Status mapping (CSI snapshot)¶

`VolumeSnapshot.status`	`Operation.status.phase`	Conditions
no `status` yet	`Pending`	`Accepted=True/TemplateValidated`, `Running=False/Pending`.
`readyToUse: false`, `error` not set	`Running`	`Running=True/VolumeSnapshotInProgress`.
`readyToUse: true`	`Succeeded`	`Succeeded=True/VolumeSnapshotReadyToUse`. `outputs.volumeSnapshotName`, `outputs.restoreSize` populated.
`error.message` set	`Failed`	`Failed=True/VolumeSnapshotFailed`. Message is mirrored verbatim.
Source PVC missing	`Failed`	`Failed=True/SourcePvcNotFound` (caught at admission where possible).

VolSync: when and how¶

VolSync is the right tool when "have a copy elsewhere" is a continuous requirement, not a point-in-time event. Concretely:

Replicating a PV to a remote object store (Restic or Kopia repository) on a schedule.
Replicating a PV to a different cluster's PVC (RsyncTLS or rclone-based) so a warm-standby application can be brought up quickly.
Restoring an application by pointing a fresh PVC at a VolSync ReplicationDestination that already has the data.

The operator integrates VolSync by materializing volsync.backube/ReplicationSource and volsync.backube/ReplicationDestination resources on the relevant clusters. Both sides are expressed as Operation resources in the operator's own model:

`ReplicationSource` example (origin cluster)¶

apiVersion: ops.vworkspace.io/v1alpha1
kind: Operation
metadata:
  name: nextcloud-myteam-replicate-2026-05-28
  namespace: org-myteam
spec:
  targetRef:
    apiVersion: apps.vworkspace.io/v1alpha1
    kind: ApplicationInstance
    name: nextcloud-myteam
  type: Backup
  engine: volsync
  parameters:
    direction: source
    pvc: data-nextcloud-myteam
    schedule: "*/15 * * * *"
    repository:
      secretName: restic-myteam
      type: restic
    retain:
      hourly: 24
      daily: 7
      weekly: 4

The materialized ReplicationSource:

apiVersion: volsync.backube/v1alpha1
kind: ReplicationSource
metadata:
  name: nextcloud-myteam-replicate-2026-05-28
  namespace: org-myteam
  labels:
    app.vworkspace.io/managed-by: vworkspace-operator
    app.vworkspace.io/cluster-id: cluster-prod-1
    ops.vworkspace.io/operation: 3e5f...
spec:
  sourcePVC: data-nextcloud-myteam
  trigger:
    schedule: "*/15 * * * *"
  restic:
    repository: restic-myteam
    copyMethod: Snapshot
    volumeSnapshotClassName: csi-rbd
    retain:
      hourly: 24
      daily: 7
      weekly: 4

The "Snapshot" copy method has VolSync use a CSI snapshot under the hood, so the running application keeps writing while the snapshot is replicated. That is the single most important reason to enable a CSI snapshot class on the cluster even when the headline use case is replication, not snapshot-as-product.

Status mapping (VolSync)¶

The operator follows ReplicationSource.status.lastSyncTime, lastSyncDuration, and conditions[]:

`ReplicationSource.status`	`Operation.status.phase`	Conditions
First sync pending	`Running`	`Running=True/VolSyncFirstSyncInProgress`.
`Synchronizing=True`	`Running`	`Running=True/VolSyncSynchronizing`.
Recent successful sync (`lastSyncTime` within `schedule`)	`Succeeded` (recurring)	`Running=True/VolSyncIdle`, `Succeeded=True/VolSyncLastSyncSucceeded`. `outputs.lastSyncTime` populated.
`conditions[Reconciled].status=False`	`Degraded`	`Degraded=True/VolSyncDegraded`. Message mirrors VolSync's reason.
Suspended	`Running` (suspended)	`Blocked=True/AwaitingResume`.

Because a ReplicationSource is a recurring resource, the parent Operation does not finalize after a single sync. The operator treats it as a long-lived recurring operation; the Succeeded condition reflects "the most recent sync window completed".

RPO and RTO¶

RPO and RTO are properties of the storage layer and the cadence, not the operator. The framework below is the language we use to reason about a given application's data-protection posture.

Posture	Mechanism	Typical RPO	Typical RTO
Periodic Velero backup	`velero.io/Backup` on a schedule with CSI snapshots	Equal to schedule (1h–24h common)	Restore time of the Backup (minutes for object restore; longer for PV restore depending on driver)
Manual snapshot before risky change	`Operation` `engine: snapshot` ad hoc	Equal to "right before the change"	Reattach the snapshot as a PVC; depends on driver
Recurring VolSync to remote repo	`ReplicationSource` with `schedule`	Minutes (lower bound is the CSI snapshot rate)	Time to restore a snapshot from the remote repo
Continuous VolSync RsyncTLS to a warm-standby PVC	`ReplicationSource` + `ReplicationDestination`	Seconds–single-digit minutes	Time to point a fresh application at the destination PVC

The operator does not promise an RPO or RTO; the choice of engine and parameters does. The control plane catalog publishes recommended defaults per application (Nextcloud: hourly Velero + nightly VolSync; OnlyOffice: nightly VolSync only; WordPress: hourly VolSync). Organizations can override the defaults per ApplicationInstance.

Practical notes¶

The VolumeSnapshotClass must exist and be marked as the default for backups (the operator picks a class explicitly via parameters.volumeSnapshotClassName, but the cluster bootstrap doc encourages a default class). Missing classes are caught at admission with Blocked=True/MissingVolumeSnapshotClass.
VolSync is optional on the cluster. Operation requests with engine: volsync are admission-rejected if VolSync is not installed.
Restoring from a CSI snapshot is a type: Restore, engine: snapshot operation; it materializes a new PVC with dataSource pointing at the snapshot and waits for the new PVC to bind. The application then needs to be reconfigured to use the new PVC, which is typically done by editing the ApplicationInstance.spec.values accordingly.
Snapshot lifecycle (TTL, garbage collection) is enforced by the CSI driver and the VolumeSnapshotClass.deletionPolicy. The operator does not delete snapshots on the user's behalf unless an explicit type: Delete engine: snapshot operation requests it.

velero.md — Velero engine; the right tool when Kubernetes objects must travel alongside the PV.
../operation-templates.md — How snapshot.snapshot and volsync.volsync templates are defined.
../backups-and-restores.md — End-to-end backup-and-restore narrative.
../../install/prerequisites.md — Cluster prerequisites for snapshots and replication.

Engine: CSI snapshots and VolSync¶

When to use each engine¶

How an Operation materializes a VolumeSnapshot¶

Worked example: CSI snapshot¶

The Operation¶

The materialized VolumeSnapshot¶

The Operation.status after completion¶