GitOps
RestoreTest resources are Kubernetes manifests. Storing them in Git and applying them through a GitOps tool ensures that your restore testing configuration is version-controlled, peer-reviewed, and consistent across environments.
Repository structure
The recommended structure is one RestoreTest per namespace, grouped by environment:
k8s/
kymaros/
base/
kustomization.yaml
production/
webapp-restoretest.yaml
postgres-restoretest.yaml
redis-restoretest.yaml
kustomization.yaml
staging/
webapp-restoretest.yaml
kustomization.yaml
This layout allows environment-specific overrides (different schedules, different minReady values) using Kustomize patches without duplicating entire manifests.
Best practice: one RestoreTest per namespace. Combining multiple application namespaces into a single RestoreTest makes it harder to identify which component caused a failure and harder to tune timeouts independently. A test for your database and your frontend should be separate resources.
Base Kustomization
# k8s/kymaros/base/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources: []
The base is intentionally empty. Each environment directory references its own manifests and overlays.
ArgoCD
Application manifest
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: kymaros-restore-tests
namespace: argocd
spec:
project: default
source:
repoURL: https://github.com/yourorg/your-infra-repo
targetRevision: main
path: k8s/kymaros/production
destination:
server: https://kubernetes.default.svc
namespace: kymaros-system
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=false
ignoreDifferences:
- group: restore.kymaros.io
kind: RestoreTest
jsonPointers:
- /status
The ignoreDifferences block prevents ArgoCD from treating RestoreTest status subresource updates as drift. Without it, every test run would mark the Application as OutOfSync.
Production RestoreTest example
# k8s/kymaros/production/webapp-restoretest.yaml
apiVersion: restore.kymaros.io/v1alpha1
kind: RestoreTest
metadata:
name: webapp-nightly
namespace: kymaros-system
spec:
schedule: "0 2 * * *"
backupSource:
name: webapp-backup
namespaces:
- name: webapp-prod
sandboxName: sandbox-webapp
checks:
- name: pods-ready
type: podStatus
podStatus:
labelSelector:
app: api-server
minReady: 2
timeout: 5m
- name: health-endpoint
type: httpGet
httpGet:
service: api-server-svc
port: 8080
path: /healthz
expectedStatus: 200
timeout: 15s
retries: 3
notifications:
onFailure:
- type: slack
channel: "#ops-alerts"
webhookSecretRef:
name: slack-secret
namespace: kymaros-system
Staging overlay with Kustomize patch
For staging, run the test more frequently and with a lower minReady:
# k8s/kymaros/staging/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- webapp-restoretest.yaml
patches:
- target:
group: restore.kymaros.io
version: v1alpha1
kind: RestoreTest
name: webapp-nightly
patch: |-
- op: replace
path: /spec/schedule
value: "0 */6 * * *"
- op: replace
path: /spec/checks/0/podStatus/minReady
value: 1
Flux
Kustomization resource
apiVersion: kustomize.toolkit.fluxcd.io/v1
kind: Kustomization
metadata:
name: kymaros-restore-tests
namespace: flux-system
spec:
interval: 5m
path: ./k8s/kymaros/production
prune: true
sourceRef:
kind: GitRepository
name: infra-repo
targetNamespace: kymaros-system
healthChecks:
- apiVersion: restore.kymaros.io/v1alpha1
kind: RestoreTest
name: webapp-nightly
namespace: kymaros-system
The healthChecks block causes Flux to wait for the RestoreTest resource to be present and its Ready condition to be True before marking the Kustomization as reconciled. This is consistent behavior — if the CRD is not installed, the health check fails and Flux reports the Kustomization as degraded.
GitRepository source
apiVersion: source.toolkit.fluxcd.io/v1
kind: GitRepository
metadata:
name: infra-repo
namespace: flux-system
spec:
interval: 1m
url: https://github.com/yourorg/your-infra-repo
ref:
branch: main
secretRef:
name: github-deploy-key
Managing Secrets in GitOps
Notification webhook URLs are sensitive and must not be committed to Git in plaintext.
Option 1: Sealed Secrets
kubectl create secret generic slack-secret \
--from-literal=webhook-url=https://hooks.slack.com/services/... \
--namespace kymaros-system \
--dry-run=client -o yaml \
| kubeseal --format yaml > k8s/kymaros/production/slack-secret.sealed.yaml
Commit slack-secret.sealed.yaml. The SealedSecret controller decrypts it on the cluster.
Option 2: External Secrets Operator
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
name: slack-secret
namespace: kymaros-system
spec:
refreshInterval: 1h
secretStoreRef:
name: vault-backend
kind: ClusterSecretStore
target:
name: slack-secret
creationPolicy: Owner
data:
- secretKey: webhook-url
remoteRef:
key: kymaros/slack
property: webhook-url
Commit this ExternalSecret manifest. The webhook URL stays in Vault or your chosen secrets backend.
Workflow recommendations
Review RestoreTest changes like application code. A change to health check thresholds or schedules should go through a pull request. A bad threshold (for example, minReady: 0) silently makes every test pass, defeating the purpose of the tool.
Use branch protection on the schedule field. In regulated environments, the test schedule may be an audit control. Treat schedule changes as configuration changes subject to change management.
Pin HealthCheckPolicy references by name. If you use standalone HealthCheckPolicy resources referenced by multiple RestoreTests, version them with names (for example, postgres-checks-v2) rather than mutating the same resource. This prevents a HealthCheckPolicy change from simultaneously affecting all tests that reference it.