Engine: CSI snapshots and VolSync¶
Status: Alpha Last Updated: 2026-05-30
For storage-centric work — PV-level snapshots and PV-level replication — the operator integrates with the CSI snapshot controller (snapshots) and VolSync (replication). The two engines share this chapter because they are storage-level primitives: neither captures Kubernetes objects, both rely on the underlying StorageClass and a CSI driver, and the choice between them is primarily a function of the required RPO/RTO targets.
This document covers when to pick CSI snapshots versus VolSync, how an Operation materializes the relevant CRDs, a worked example for each, status mapping, and the RPO/RTO trade-offs.
When to use each engine¶
| Question | CSI snapshot | VolSync |
|---|---|---|
| Capture a single PV at a point in time? | Yes (engine: snapshot) | Indirectly (sync at interval) |
| Replicate a PV continuously to another cluster or storage? | No | Yes (engine: volsync) |
| Capture Kubernetes objects (Secrets, ConfigMaps, etc.) along with PV? | No (use Velero) | No (use Velero) |
| Crash-consistent vs application-consistent? | Crash; app-consistent via quiesce hook | Async replication; consistency follows the engine (Restic, Kopia, rsync, rclone) |
| Achievable RPO | Equals the snapshot cadence | Minutes (Restic), seconds (rclone over fast storage) |
| Achievable RTO | Reattach time of the cloned PV | Restore time of the replicated snapshot / repo |
| Storage requirements | CSI driver with snapshot support, VolumeSnapshotClass installed |
CSI driver with VolumeSnapshotClass (most VolSync flows snapshot first), plus a remote backend |
| Cross-cluster | No (snapshots live where the PV lives) | Yes (the entire point of VolSync) |
In short: CSI snapshots answer "freeze this volume now and let me roll back to it"; VolSync answers "keep this volume mirrored to somewhere else with a continuous lag I can tolerate".
The Velero engine (velero.md) is complementary: Velero captures Kubernetes objects and can drive CSI snapshots underneath. If the work is "back up an application", use Velero; if the work is "I want a volume-level snapshot I can mount as a sibling PVC", use the CSI snapshot engine; if the work is "this PV must be continuously mirrored to a DR location", use VolSync.
How an Operation materializes a VolumeSnapshot¶
When the reconciler admits an Operation of engine: snapshot, it:
- Resolves the target
ApplicationInstanceand the PVCs it owns (declared viaspec.persistencein the operation template, typically a single named PVC such asdata-<release>). - Constructs a
snapshot.storage.k8s.io/VolumeSnapshotin the target's namespace, withspec.source.persistentVolumeClaimNameset to the named PVC andspec.volumeSnapshotClassNameset to the requested class. - Sets ownership labels (
app.vworkspace.io/managed-by,app.vworkspace.io/cluster-id,ops.vworkspace.io/operation) on theVolumeSnapshot. - Watches the
VolumeSnapshot.status.readyToUseandVolumeSnapshot.status.restoreSizeand rewritesOperation.statuson each transition.
When a quiesce hook is advertised on the ApplicationInstance (ops.vworkspace.io/quiesce: exec), the operator can optionally invoke it before creating the VolumeSnapshot and reverse it after readyToUse=true. The hook is opt-in and is described as part of the operation template's parameters.quiesce block.
Worked example: CSI snapshot¶
A VolumeSnapshot of the Nextcloud data PVC, taken as a pre-migration safety net (also visible inline in the Argo Workflow example in argo-workflows.md):
The Operation¶
apiVersion: ops.vworkspace.io/v1alpha1
kind: Operation
metadata:
name: nextcloud-myteam-snapshot-2026-05-28
namespace: org-myteam
spec:
targetRef:
apiVersion: apps.vworkspace.io/v1alpha1
kind: ApplicationInstance
name: nextcloud-myteam
type: Backup
engine: snapshot
parameters:
pvc: data-nextcloud-myteam
volumeSnapshotClassName: csi-rbd
quiesce:
enabled: true
timeoutSeconds: 60
The materialized VolumeSnapshot¶
apiVersion: snapshot.storage.k8s.io/v1
kind: VolumeSnapshot
metadata:
name: nextcloud-myteam-snapshot-2026-05-28
namespace: org-myteam
labels:
app.vworkspace.io/managed-by: vworkspace-operator
app.vworkspace.io/cluster-id: cluster-prod-1
ops.vworkspace.io/operation: 2c4d...
spec:
volumeSnapshotClassName: csi-rbd
source:
persistentVolumeClaimName: data-nextcloud-myteam
The Operation.status after completion¶
status:
phase: Succeeded
startedAt: "2026-05-28T11:00:00Z"
finishedAt: "2026-05-28T11:00:14Z"
conditions:
- type: Accepted
status: "True"
reason: TemplateValidated
- type: Succeeded
status: "True"
reason: VolumeSnapshotReadyToUse
message: "VolumeSnapshot is ReadyToUse; restoreSize 250Gi"
outputs:
volumeSnapshotName: nextcloud-myteam-snapshot-2026-05-28
volumeSnapshotContentName: snapcontent-...
restoreSize: "250Gi"
Status mapping (CSI snapshot)¶
VolumeSnapshot.status |
Operation.status.phase |
Conditions |
|---|---|---|
no status yet |
Pending |
Accepted=True/TemplateValidated, Running=False/Pending. |
readyToUse: false, error not set |
Running |
Running=True/VolumeSnapshotInProgress. |
readyToUse: true |
Succeeded |
Succeeded=True/VolumeSnapshotReadyToUse. outputs.volumeSnapshotName, outputs.restoreSize populated. |
error.message set |
Failed |
Failed=True/VolumeSnapshotFailed. Message is mirrored verbatim. |
| Source PVC missing | Failed |
Failed=True/SourcePvcNotFound (caught at admission where possible). |
VolSync: when and how¶
VolSync is the right tool when "have a copy elsewhere" is a continuous requirement, not a point-in-time event. Concretely:
- Replicating a PV to a remote object store (Restic or Kopia repository) on a schedule.
- Replicating a PV to a different cluster's PVC (RsyncTLS or rclone-based) so a warm-standby application can be brought up quickly.
- Restoring an application by pointing a fresh PVC at a VolSync
ReplicationDestinationthat already has the data.
The operator integrates VolSync by materializing volsync.backube/ReplicationSource and volsync.backube/ReplicationDestination resources on the relevant clusters. Both sides are expressed as Operation resources in the operator's own model:
ReplicationSource example (origin cluster)¶
apiVersion: ops.vworkspace.io/v1alpha1
kind: Operation
metadata:
name: nextcloud-myteam-replicate-2026-05-28
namespace: org-myteam
spec:
targetRef:
apiVersion: apps.vworkspace.io/v1alpha1
kind: ApplicationInstance
name: nextcloud-myteam
type: Backup
engine: volsync
parameters:
direction: source
pvc: data-nextcloud-myteam
schedule: "*/15 * * * *"
repository:
secretName: restic-myteam
type: restic
retain:
hourly: 24
daily: 7
weekly: 4
The materialized ReplicationSource:
apiVersion: volsync.backube/v1alpha1
kind: ReplicationSource
metadata:
name: nextcloud-myteam-replicate-2026-05-28
namespace: org-myteam
labels:
app.vworkspace.io/managed-by: vworkspace-operator
app.vworkspace.io/cluster-id: cluster-prod-1
ops.vworkspace.io/operation: 3e5f...
spec:
sourcePVC: data-nextcloud-myteam
trigger:
schedule: "*/15 * * * *"
restic:
repository: restic-myteam
copyMethod: Snapshot
volumeSnapshotClassName: csi-rbd
retain:
hourly: 24
daily: 7
weekly: 4
The "Snapshot" copy method has VolSync use a CSI snapshot under the hood, so the running application keeps writing while the snapshot is replicated. That is the single most important reason to enable a CSI snapshot class on the cluster even when the headline use case is replication, not snapshot-as-product.
Status mapping (VolSync)¶
The operator follows ReplicationSource.status.lastSyncTime, lastSyncDuration, and conditions[]:
ReplicationSource.status |
Operation.status.phase |
Conditions |
|---|---|---|
| First sync pending | Running |
Running=True/VolSyncFirstSyncInProgress. |
Synchronizing=True |
Running |
Running=True/VolSyncSynchronizing. |
Recent successful sync (lastSyncTime within schedule) |
Succeeded (recurring) |
Running=True/VolSyncIdle, Succeeded=True/VolSyncLastSyncSucceeded. outputs.lastSyncTime populated. |
conditions[Reconciled].status=False |
Degraded |
Degraded=True/VolSyncDegraded. Message mirrors VolSync's reason. |
| Suspended | Running (suspended) |
Blocked=True/AwaitingResume. |
Because a ReplicationSource is a recurring resource, the parent Operation does not finalize after a single sync. The operator treats it as a long-lived recurring operation; the Succeeded condition reflects "the most recent sync window completed".
RPO and RTO¶
RPO and RTO are properties of the storage layer and the cadence, not the operator. The framework below is the language we use to reason about a given application's data-protection posture.
| Posture | Mechanism | Typical RPO | Typical RTO |
|---|---|---|---|
| Periodic Velero backup | velero.io/Backup on a schedule with CSI snapshots |
Equal to schedule (1h–24h common) | Restore time of the Backup (minutes for object restore; longer for PV restore depending on driver) |
| Manual snapshot before risky change | Operation engine: snapshot ad hoc |
Equal to "right before the change" | Reattach the snapshot as a PVC; depends on driver |
| Recurring VolSync to remote repo | ReplicationSource with schedule |
Minutes (lower bound is the CSI snapshot rate) | Time to restore a snapshot from the remote repo |
| Continuous VolSync RsyncTLS to a warm-standby PVC | ReplicationSource + ReplicationDestination |
Seconds–single-digit minutes | Time to point a fresh application at the destination PVC |
The operator does not promise an RPO or RTO; the choice of engine and parameters does. The control plane catalog publishes recommended defaults per application (Nextcloud: hourly Velero + nightly VolSync; OnlyOffice: nightly VolSync only; WordPress: hourly VolSync). Organizations can override the defaults per ApplicationInstance.
Practical notes¶
- The
VolumeSnapshotClassmust exist and be marked as the default for backups (the operator picks a class explicitly viaparameters.volumeSnapshotClassName, but the cluster bootstrap doc encourages a default class). Missing classes are caught at admission withBlocked=True/MissingVolumeSnapshotClass. - VolSync is optional on the cluster.
Operationrequests withengine: volsyncare admission-rejected if VolSync is not installed. - Restoring from a CSI snapshot is a
type: Restore,engine: snapshotoperation; it materializes a new PVC withdataSourcepointing at the snapshot and waits for the new PVC to bind. The application then needs to be reconfigured to use the new PVC, which is typically done by editing theApplicationInstance.spec.valuesaccordingly. - Snapshot lifecycle (TTL, garbage collection) is enforced by the CSI driver and the
VolumeSnapshotClass.deletionPolicy. The operator does not delete snapshots on the user's behalf unless an explicittype: Deleteengine: snapshotoperation requests it.
Related material¶
- velero.md — Velero engine; the right tool when Kubernetes objects must travel alongside the PV.
- ../operation-templates.md — How
snapshot.snapshotandvolsync.volsynctemplates are defined. - ../backups-and-restores.md — End-to-end backup-and-restore narrative.
- ../../install/prerequisites.md — Cluster prerequisites for snapshots and replication.