Skip to main content

GitOps

RestoreTest resources are Kubernetes manifests. Storing them in Git and applying them through a GitOps tool ensures that your restore testing configuration is version-controlled, peer-reviewed, and consistent across environments.

Repository structure

The recommended structure is one RestoreTest per namespace, grouped by environment:

k8s/
kymaros/
base/
kustomization.yaml
production/
webapp-restoretest.yaml
postgres-restoretest.yaml
redis-restoretest.yaml
kustomization.yaml
staging/
webapp-restoretest.yaml
kustomization.yaml

This layout allows environment-specific overrides (different schedules, different minReady values) using Kustomize patches without duplicating entire manifests.

Best practice: one RestoreTest per namespace. Combining multiple application namespaces into a single RestoreTest makes it harder to identify which component caused a failure and harder to tune timeouts independently. A test for your database and your frontend should be separate resources.


Base Kustomization

# k8s/kymaros/base/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources: []

The base is intentionally empty. Each environment directory references its own manifests and overlays.


ArgoCD

Application manifest

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: kymaros-restore-tests
namespace: argocd
spec:
project: default
source:
repoURL: https://github.com/yourorg/your-infra-repo
targetRevision: main
path: k8s/kymaros/production
destination:
server: https://kubernetes.default.svc
namespace: kymaros-system
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=false
ignoreDifferences:
- group: restore.kymaros.io
kind: RestoreTest
jsonPointers:
- /status

The ignoreDifferences block prevents ArgoCD from treating RestoreTest status subresource updates as drift. Without it, every test run would mark the Application as OutOfSync.

Production RestoreTest example

# k8s/kymaros/production/webapp-restoretest.yaml
apiVersion: restore.kymaros.io/v1alpha1
kind: RestoreTest
metadata:
name: webapp-nightly
namespace: kymaros-system
spec:
schedule: "0 2 * * *"
backupSource:
name: webapp-backup
namespaces:
- name: webapp-prod
sandboxName: sandbox-webapp
checks:
- name: pods-ready
type: podStatus
podStatus:
labelSelector:
app: api-server
minReady: 2
timeout: 5m
- name: health-endpoint
type: httpGet
httpGet:
service: api-server-svc
port: 8080
path: /healthz
expectedStatus: 200
timeout: 15s
retries: 3
notifications:
onFailure:
- type: slack
channel: "#ops-alerts"
webhookSecretRef:
name: slack-secret
namespace: kymaros-system

Staging overlay with Kustomize patch

For staging, run the test more frequently and with a lower minReady:

# k8s/kymaros/staging/kustomization.yaml
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- webapp-restoretest.yaml
patches:
- target:
group: restore.kymaros.io
version: v1alpha1
kind: RestoreTest
name: webapp-nightly
patch: |-
- op: replace
path: /spec/schedule
value: "0 */6 * * *"
- op: replace
path: /spec/checks/0/podStatus/minReady
value: 1

Flux

Kustomization resource

apiVersion: kustomize.toolkit.fluxcd.io/v1
kind: Kustomization
metadata:
name: kymaros-restore-tests
namespace: flux-system
spec:
interval: 5m
path: ./k8s/kymaros/production
prune: true
sourceRef:
kind: GitRepository
name: infra-repo
targetNamespace: kymaros-system
healthChecks:
- apiVersion: restore.kymaros.io/v1alpha1
kind: RestoreTest
name: webapp-nightly
namespace: kymaros-system

The healthChecks block causes Flux to wait for the RestoreTest resource to be present and its Ready condition to be True before marking the Kustomization as reconciled. This is consistent behavior — if the CRD is not installed, the health check fails and Flux reports the Kustomization as degraded.

GitRepository source

apiVersion: source.toolkit.fluxcd.io/v1
kind: GitRepository
metadata:
name: infra-repo
namespace: flux-system
spec:
interval: 1m
url: https://github.com/yourorg/your-infra-repo
ref:
branch: main
secretRef:
name: github-deploy-key

Managing Secrets in GitOps

Notification webhook URLs are sensitive and must not be committed to Git in plaintext.

Option 1: Sealed Secrets

kubectl create secret generic slack-secret \
--from-literal=webhook-url=https://hooks.slack.com/services/... \
--namespace kymaros-system \
--dry-run=client -o yaml \
| kubeseal --format yaml > k8s/kymaros/production/slack-secret.sealed.yaml

Commit slack-secret.sealed.yaml. The SealedSecret controller decrypts it on the cluster.

Option 2: External Secrets Operator

apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
name: slack-secret
namespace: kymaros-system
spec:
refreshInterval: 1h
secretStoreRef:
name: vault-backend
kind: ClusterSecretStore
target:
name: slack-secret
creationPolicy: Owner
data:
- secretKey: webhook-url
remoteRef:
key: kymaros/slack
property: webhook-url

Commit this ExternalSecret manifest. The webhook URL stays in Vault or your chosen secrets backend.


Workflow recommendations

Review RestoreTest changes like application code. A change to health check thresholds or schedules should go through a pull request. A bad threshold (for example, minReady: 0) silently makes every test pass, defeating the purpose of the tool.

Use branch protection on the schedule field. In regulated environments, the test schedule may be an audit control. Treat schedule changes as configuration changes subject to change management.

Pin HealthCheckPolicy references by name. If you use standalone HealthCheckPolicy resources referenced by multiple RestoreTests, version them with names (for example, postgres-checks-v2) rather than mutating the same resource. This prevents a HealthCheckPolicy change from simultaneously affecting all tests that reference it.