HealthCheckPolicy
API group: restore.kymaros.io/v1alpha1
Kind: HealthCheckPolicy
Short name: hcp
Scope: Namespaced (typically kymaros-system)
A HealthCheckPolicy defines a named, reusable set of probes that Kymaros runs inside a sandbox after a restore completes. A RestoreTest references a policy by name via spec.healthChecks.policyRef. Multiple tests can share the same policy.
Spec
| Field | Type | Required | Description |
|---|---|---|---|
checks | []HealthCheck | Yes | Ordered list of health checks to execute. All checks run unless a preceding check causes an abort. |
HealthCheck
Each entry in checks defines a single probe. The type field selects the probe type and determines which sub-fields are relevant.
| Field | Type | Required | Description |
|---|---|---|---|
name | string | Yes | Unique identifier for this check within the policy. Referenced in RestoreReport.status.checks[*].name. |
type | string | Yes | Probe type. Accepted values: podStatus, httpGet, exec, tcpSocket, resourceExists. |
podStatus | PodStatusCheck | Conditional | Required when type is podStatus. |
httpGet | HTTPGetCheck | Conditional | Required when type is httpGet. |
exec | ExecCheck | Conditional | Required when type is exec. |
tcpSocket | TCPSocketCheck | Conditional | Required when type is tcpSocket. |
resourceExists | ResourceExistsCheck | Conditional | Required when type is resourceExists. |
Check type reference
podStatus
Verifies that pods matching a label selector have reached Running and Ready state. This is the most common first check after a restore.
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
labelSelector | map[string]string | Yes | — | Kubernetes label selector used to identify the target pods in the sandbox namespace. |
minReady | int | Yes | — | Minimum number of pods that must be in Ready state for the check to pass. |
timeout | Duration | No | — | Maximum time to wait for pods to reach Ready state. When omitted, the HealthCheckRef.timeout on the RestoreTest governs. |
httpGet
Sends an HTTP GET request to a service within the sandbox and validates the response status code.
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
service | string | Yes | — | Name of the Kubernetes Service in the sandbox namespace to probe. |
port | int | Yes | — | Port number on the service. |
path | string | Yes | — | HTTP path to request (e.g., "/healthz"). |
expectedStatus | int | Yes | — | HTTP response status code expected for the check to pass (e.g., 200). |
timeout | Duration | No | — | Per-request timeout. |
retries | int | No | — | Number of retry attempts before marking the check as failed. |
exec
Executes a command inside a container running in the sandbox and validates the exit code.
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
podSelector | map[string]string | Yes | — | Label selector for the target pod. The command runs in the first matching pod. |
container | string | Yes | — | Name of the container within the selected pod in which to run the command. |
command | []string | Yes | — | Command and arguments to execute (exec form, not shell). Example: ["pg_isready", "-U", "app"]. |
successExitCode | int | Yes | — | Exit code that indicates success. Typically 0. |
timeout | Duration | No | — | Maximum time to wait for the command to complete. |
tcpSocket
Attempts a TCP connection to a port on a service within the sandbox. Passes if the connection is accepted.
| Field | Type | Required | Default | Description |
|---|---|---|---|---|
service | string | Yes | — | Name of the Kubernetes Service in the sandbox namespace to probe. |
port | int | Yes | — | TCP port to connect to. |
timeout | Duration | No | — | Connection timeout. |
resourceExists
Verifies that specific named Kubernetes resources are present in the sandbox after the restore. Useful for confirming that CRDs and custom objects were included in the backup and restored correctly.
| Field | Type | Required | Description |
|---|---|---|---|
resources | []ResourceRef | Yes | List of resources to verify. |
ResourceRef
| Field | Type | Required | Description |
|---|---|---|---|
kind | string | Yes | Kubernetes resource kind (e.g., "Deployment", "ConfigMap", "MyCustomResource"). |
name | string | Yes | Name of the resource to look for in the sandbox namespace. |
Examples
Web API application
Checks that the API pods are running, that the readiness endpoint returns HTTP 200, and that the in-cluster cache service accepts TCP connections.
apiVersion: restore.kymaros.io/v1alpha1
kind: HealthCheckPolicy
metadata:
name: web-api-health-policy
namespace: kymaros-system
spec:
checks:
- name: api-pods-ready
type: podStatus
podStatus:
labelSelector:
app: api
component: server
minReady: 2
timeout: 5m
- name: api-http-healthz
type: httpGet
httpGet:
service: api-service
port: 8080
path: /healthz
expectedStatus: 200
timeout: 10s
retries: 3
- name: api-http-readyz
type: httpGet
httpGet:
service: api-service
port: 8080
path: /readyz
expectedStatus: 200
timeout: 10s
retries: 3
- name: redis-tcp-reachable
type: tcpSocket
tcpSocket:
service: redis
port: 6379
timeout: 5s
- name: configmap-exists
type: resourceExists
resourceExists:
resources:
- kind: ConfigMap
name: api-config
- kind: Secret
name: api-tls-cert
Database application
Verifies the PostgreSQL StatefulSet pod is ready, then runs pg_isready inside the container to confirm the database process is accepting connections.
apiVersion: restore.kymaros.io/v1alpha1
kind: HealthCheckPolicy
metadata:
name: postgres-health-policy
namespace: kymaros-system
spec:
checks:
- name: postgres-pod-ready
type: podStatus
podStatus:
labelSelector:
app: postgres
role: primary
minReady: 1
timeout: 8m
- name: postgres-accepting-connections
type: exec
exec:
podSelector:
app: postgres
role: primary
container: postgres
command:
- pg_isready
- -U
- app
- -d
- orders
successExitCode: 0
timeout: 30s
- name: postgres-tcp-port
type: tcpSocket
tcpSocket:
service: postgres-service
port: 5432
timeout: 5s
- name: postgres-schema-migrated
type: exec
exec:
podSelector:
app: postgres
role: primary
container: postgres
command:
- psql
- -U
- app
- -d
- orders
- -c
- "SELECT COUNT(*) FROM schema_migrations;"
successExitCode: 0
timeout: 1m
- name: pvc-exists
type: resourceExists
resourceExists:
resources:
- kind: PersistentVolumeClaim
name: postgres-data-postgres-0
Background worker application
Verifies that worker pods are running and that a custom WorkerQueue CRD resource was restored, then confirms the worker can connect to its job queue over TCP.
apiVersion: restore.kymaros.io/v1alpha1
kind: HealthCheckPolicy
metadata:
name: worker-health-policy
namespace: kymaros-system
spec:
checks:
- name: worker-pods-ready
type: podStatus
podStatus:
labelSelector:
app: worker
minReady: 1
timeout: 6m
- name: worker-queue-resource-exists
type: resourceExists
resourceExists:
resources:
- kind: WorkerQueue
name: default-queue
- kind: ConfigMap
name: worker-config
- name: rabbitmq-tcp-reachable
type: tcpSocket
tcpSocket:
service: rabbitmq
port: 5672
timeout: 10s
- name: worker-process-alive
type: exec
exec:
podSelector:
app: worker
container: worker
command:
- /bin/sh
- -c
- "ps aux | grep -q '[w]orker' && exit 0 || exit 1"
successExitCode: 0
timeout: 10s
kubectl quick reference
# List all HealthCheckPolicy resources
kubectl get hcp -n kymaros-system
# Describe a policy to see all checks
kubectl describe hcp web-api-health-policy -n kymaros-system
# List check names for a specific policy
kubectl get hcp web-api-health-policy -n kymaros-system \
-o jsonpath='{range .spec.checks[*]}{.name}{"\t"}{.type}{"\n"}{end}'