Introduction

Kymaros is an open-source Kubernetes Operator that continuously validates backup restores. It restores your backups into isolated sandbox namespaces, runs configurable health checks, measures restore duration against your SLA, and produces a scored report — so you know your backups actually work before a disaster forces you to find out.

License: Apache 2.0
API group: restore.kymaros.io/v1alpha1
Project: github.com/kymorahq/kymora

The Problem

Backup tools report job status, not restorability. A backup marked Completed tells you the data was written somewhere. It does not tell you whether that data restores into a working application, whether pods come up healthy, whether dependencies resolve, or how long the restore actually takes.

Teams discover this gap at the worst possible moment: during an incident, under pressure, with an untested restore procedure and no confident RTO estimate.

Kymaros closes that gap by treating restore validation as a continuous, automated process — not a manual exercise performed once at audit time.

How It Works

Kymaros operates in four stages:

1. Define

You create a RestoreTest resource that declares which backup to test, how often, and what conditions a successful restore must satisfy:

apiVersion: restore.kymaros.io/v1alpha1
kind: RestoreTest
metadata:
  name: my-app-nightly
  namespace: kymaros-system
spec:
  backupSource:
    provider: velero
    backupName: latest
    namespaces:
      - name: my-app
  schedule:
    cron: "0 3 * * *"
  sandbox:
    namespacePrefix: rp-test
    ttl: 30m
    networkIsolation: strict
  sla:
    maxRTO: "15m"

2. Sandbox

At each scheduled run, the controller creates an isolated namespace (prefixed by sandbox.namespacePrefix). The backup is restored into that namespace using Velero. Network isolation prevents the sandbox from interfering with production workloads. The sandbox is automatically deleted after the TTL expires.

3. Validate

Once the restore completes, Kymaros runs a validation sequence:

Restore integrity — confirms the restore operation itself succeeded
Completeness — checks that the expected resource counts (Deployments, Services, PVCs, etc.) match the source namespace
Pod startup — waits for all pods to reach Running and Ready state
Health checks — executes HTTP probes, exec commands, or custom checks defined in a HealthCheckPolicy
Cross-namespace dependencies — verifies that services expected outside the sandbox are reachable
RTO compliance — records total restore duration and compares it against sla.maxRTO

4. Report

The controller writes a RestoreReport resource with a confidence score (0–100) and the full breakdown of each validation step. A score of 90 or above means the restore passed. Below 70 is a failure. Scores between 70 and 89 indicate a partial pass with degraded confidence.

Reports can be read with kubectl, viewed in the built-in dashboard, or scraped via the Prometheus metrics endpoint.

Quick Links

Resource	Path
Installation	Getting Started: Installation
First test in 5 minutes	Getting Started: Quick Start
Reading reports	Getting Started: Your First Report
Dashboard access	Getting Started: Dashboard Access
CRD reference	RestoreTest
GitHub	kymorahq/kymora

The Problem​

How It Works​

1. Define​

2. Sandbox​

3. Validate​

4. Report​

Quick Links​