API group: restore.kymaros.io/v1alpha1
Kind: RestoreTest
Short name: rt
Scope: Namespaced (typically kymaros-system)
A RestoreTest defines a scheduled restore validation job. The controller reads it, restores the specified backup into an isolated sandbox namespace at each scheduled interval, runs health checks, measures RTO, and produces a RestoreReport.
Spec
Top-level fields
| Field | Type | Required | Default | Description |
|---|
backupSource | BackupSource | Yes | — | Identifies the backup to restore and from which provider. |
schedule | ScheduleConfig | Yes | — | Controls when restore tests run. |
sandbox | SandboxConfig | Yes | — | Configures the isolated sandbox namespace used during testing. |
healthChecks | HealthCheckRef | No | — | References a HealthCheckPolicy to run after restore. |
sla | SLAConfig | No | — | Defines the RTO target and alert behavior. |
notifications | NotificationConfig | No | — | Configures where pass/fail notifications are sent. |
timeout | Duration | No | — | Global timeout for the entire test run. |
historyLimit | int32 | No | 10 | Number of RestoreReport objects to retain. Minimum: 1. |
BackupSource
| Field | Type | Required | Default | Description |
|---|
provider | string | Yes | — | Backup provider. Accepted values: velero, kasten, trilio. |
backupName | string | Yes | — | Name of the backup to restore. Use "latest" to always select the most recent backup. |
namespaces | []NamespaceMapping | Yes | — | Source namespaces to restore. At least one entry required. |
labelSelector | map[string]string | No | — | Kubernetes label selector applied when identifying the backup. |
NamespaceMapping
| Field | Type | Required | Default | Description |
|---|
name | string | Yes | — | Source namespace name in the backup. |
sandboxName | string | No | — | Override the generated sandbox namespace name for this source namespace. When omitted, the name is derived from sandbox.namespacePrefix and the source namespace name. |
ScheduleConfig
| Field | Type | Required | Default | Description |
|---|
cron | string | Yes | — | Standard five-field cron expression (e.g., "0 3 * * *" for 03:00 daily). |
timezone | string | No | "UTC" | IANA timezone name for cron evaluation (e.g., "Europe/Paris"). |
SandboxConfig
| Field | Type | Required | Default | Description |
|---|
namespacePrefix | string | No | "rp-test" | Prefix for generated sandbox namespace names. |
ttl | Duration | No | "30m" | How long the sandbox namespace lives after the test completes before automatic deletion. |
resourceQuota | ResourceQuotaConfig | No | — | Resource limits applied to the sandbox namespace. |
networkIsolation | string | No | "strict" | Network policy mode. strict: sandbox has no external egress. group: sandbox pods can reach other sandboxes in the same group but not production. |
ResourceQuotaConfig
| Field | Type | Required | Default | Description |
|---|
cpu | string | No | — | CPU limit for the sandbox namespace (e.g., "4"). |
memory | string | No | — | Memory limit for the sandbox namespace (e.g., "8Gi"). |
storage | string | No | — | Total storage request limit for the sandbox namespace (e.g., "50Gi"). |
HealthCheckRef
| Field | Type | Required | Default | Description |
|---|
policyRef | string | No | — | Name of a HealthCheckPolicy resource in the same namespace. |
timeout | Duration | No | — | Maximum time to wait for all health checks defined in the referenced policy to pass. |
SLAConfig
| Field | Type | Required | Default | Description |
|---|
maxRTO | Duration | Yes (if sla set) | — | Maximum acceptable restore time. The measured duration is compared against this value and recorded in the report as rto.withinSLA. |
alertOnExceed | bool | No | — | When true, a notification is sent through the configured channels when measured RTO exceeds maxRTO. |
NotificationConfig
| Field | Type | Required | Default | Description |
|---|
onFailure | []NotificationChannel | No | — | Channels to notify when the test result is fail. |
onSuccess | []NotificationChannel | No | — | Channels to notify when the test result is pass. |
NotificationChannel
| Field | Type | Required | Default | Description |
|---|
type | string | Yes | — | Notification backend. Accepted values: slack, pagerduty, webhook. |
channel | string | No | — | Slack channel name or ID (e.g., "#alerts"). Only applicable when type is slack. |
webhookSecretRef | string | No | — | Name of a Secret in the same namespace that contains the webhook URL or API token. The secret must have a key named url for webhook type or token for pagerduty. |
Status
| Field | Type | Description |
|---|
phase | string | Current lifecycle phase: Idle, Running, Completed, or Failed. |
lastRunAt | Time | Timestamp of the most recent completed test run. |
lastScore | int | Confidence score (0–100) from the most recent run. |
lastResult | string | Outcome of the most recent run: pass, fail, or partial. |
lastReportRef | string | Name of the RestoreReport object created by the most recent run. |
nextRunAt | Time | Scheduled time for the next test run, derived from schedule.cron. |
sandboxNamespace | string | Name of the active sandbox namespace (populated during Running phase). |
restoreID | string | Provider-specific restore operation identifier (populated during Running phase). |
conditions | []Condition | Standard Kubernetes condition array reflecting reconciliation state. |
Print columns exposed by kubectl get rt: Phase, Score, Result, Last Run, Age.
Examples
Simple stateless application
Runs nightly at 02:00 UTC, restores the latest Velero backup of the my-app namespace, applies a 30-minute RTO target, and notifies Slack on failure.
apiVersion: restore.kymaros.io/v1alpha1
kind: RestoreTest
metadata:
name: my-app-nightly
namespace: kymaros-system
spec:
backupSource:
provider: velero
backupName: latest
namespaces:
- name: my-app
schedule:
cron: "0 2 * * *"
timezone: UTC
sandbox:
namespacePrefix: rp-test
ttl: 30m
networkIsolation: strict
sla:
maxRTO: "30m"
alertOnExceed: true
notifications:
onFailure:
- type: slack
channel: "#ops-alerts"
webhookSecretRef: slack-kymaros-webhook
historyLimit: 10
Stateful application with database health checks
Restores the orders namespace using Velero, runs a custom HealthCheckPolicy that verifies the database pod and HTTP readiness endpoint, enforces a 15-minute RTO, and sends both success and failure notifications.
apiVersion: restore.kymaros.io/v1alpha1
kind: RestoreTest
metadata:
name: orders-db-validation
namespace: kymaros-system
spec:
backupSource:
provider: velero
backupName: orders-daily
namespaces:
- name: orders
schedule:
cron: "0 3 * * *"
timezone: "Europe/Paris"
sandbox:
namespacePrefix: rp-test
ttl: 45m
resourceQuota:
cpu: "4"
memory: 8Gi
storage: 50Gi
networkIsolation: strict
healthChecks:
policyRef: orders-health-policy
timeout: 10m
sla:
maxRTO: "15m"
alertOnExceed: true
notifications:
onFailure:
- type: pagerduty
webhookSecretRef: pagerduty-kymaros-token
- type: slack
channel: "#platform-alerts"
webhookSecretRef: slack-kymaros-webhook
onSuccess:
- type: slack
channel: "#platform-ops"
webhookSecretRef: slack-kymaros-webhook
timeout: 1h
historyLimit: 30
Multi-namespace application
Restores three namespaces that together form a single application (frontend, backend API, and shared infrastructure). Each namespace maps to a distinct sandbox name to avoid collisions.
apiVersion: restore.kymaros.io/v1alpha1
kind: RestoreTest
metadata:
name: platform-full-stack
namespace: kymaros-system
spec:
backupSource:
provider: velero
backupName: latest
namespaces:
- name: platform-frontend
sandboxName: rp-frontend
- name: platform-api
sandboxName: rp-api
- name: platform-infra
sandboxName: rp-infra
labelSelector:
app.kubernetes.io/part-of: platform
schedule:
cron: "0 4 * * 0"
timezone: "America/New_York"
sandbox:
namespacePrefix: rp-test
ttl: 1h
resourceQuota:
cpu: "8"
memory: 16Gi
storage: 100Gi
networkIsolation: group
healthChecks:
policyRef: platform-health-policy
timeout: 15m
sla:
maxRTO: "45m"
alertOnExceed: true
notifications:
onFailure:
- type: webhook
webhookSecretRef: incident-webhook
timeout: 2h
historyLimit: 20
kubectl quick reference
kubectl get rt -n kymaros-system
kubectl describe rt my-app-nightly -n kymaros-system
kubectl get rt -n kymaros-system -w
kubectl edit rt my-app-nightly -n kymaros-system