Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/chaos-mesh/chaos-mesh/llms.txt

Use this file to discover all available pages before exploring further.

This guide covers production-ready installation of Chaos Mesh with various configuration options.

Prerequisites

1

Kubernetes Cluster

Kubernetes version 1.12 or later (1.20+ recommended for production)
2

Container Runtime

One of: Docker, containerd, or CRI-O
3

Access Rights

Cluster-admin privileges or appropriate RBAC permissions
4

Helm (Optional)

Helm 3.x for Helm-based installation
For production environments, we recommend using Helm for easier configuration management and upgrades.

Installation Methods

Helm provides the most flexible installation with easy configuration management.

Basic Installation

# Add Chaos Mesh Helm repository
helm repo add chaos-mesh https://charts.chaos-mesh.org
helm repo update

# Create namespace
kubectl create namespace chaos-mesh

# Install Chaos Mesh
helm install chaos-mesh chaos-mesh/chaos-mesh \
  --namespace chaos-mesh \
  --version latest

Container Runtime Configuration

Configure Chaos Mesh based on your container runtime:
helm install chaos-mesh chaos-mesh/chaos-mesh \
  --namespace chaos-mesh \
  --set chaosDaemon.runtime=docker \
  --set chaosDaemon.socketPath=/var/run/docker.sock

Production Configuration

For production deployments, use a values file with custom configurations:
values-production.yaml
# Container runtime settings
chaosDaemon:
  runtime: containerd
  socketPath: /run/containerd/containerd.sock
  
  # Resource profile: light, standard, or intensive
  resourceProfile: standard
  
  # Override specific resources if needed
  resources:
    limits:
      memory: 1Gi
    requests:
      cpu: 250m
      memory: 512Mi
  
  # Run with specific capabilities instead of privileged mode
  privileged: false
  capabilities:
    add:
      - SYS_PTRACE
      - NET_ADMIN
      - NET_RAW
      - MKNOD
      - SYS_CHROOT
      - SYS_ADMIN
      - KILL
      - IPC_LOCK

# Controller Manager settings
controllerManager:
  replicaCount: 3
  
  resources:
    limits:
      cpu: 500m
      memory: 1024Mi
    requests:
      cpu: 100m
      memory: 256Mi
  
  # Enable leader election for HA
  leaderElection:
    enabled: true
    leaseDuration: 15s
    renewDeadline: 10s
    retryPeriod: 2s

# Dashboard settings
dashboard:
  create: true
  
  # Enable security mode (recommended)
  securityMode: true
  
  # Use LoadBalancer or Ingress for production
  service:
    type: ClusterIP
  
  # Enable persistent storage for SQLite
  persistentVolume:
    enabled: true
    size: 8Gi
    storageClassName: standard
  
  resources:
    limits:
      cpu: 500m
      memory: 1024Mi
    requests:
      cpu: 50m
      memory: 256Mi

# DNS Server for DNSChaos
dnsServer:
  create: true
  resources:
    limits:
      cpu: 500m
      memory: 256Mi
    requests:
      cpu: 100m
      memory: 70Mi

# Timezone configuration
timezone: "UTC"

# Image registry (use your own registry if needed)
images:
  registry: ghcr.io
  tag: latest
Apply the configuration:
helm install chaos-mesh chaos-mesh/chaos-mesh \
  --namespace chaos-mesh \
  --values values-production.yaml

Method 2: Installation Script

The installation script provides automated setup with various options.

Basic Usage

curl -sSL https://mirrors.chaos-mesh.org/latest/install.sh | bash

Script Options

--local
string
Run local Kubernetes cluster (supports: kind)
curl -sSL https://mirrors.chaos-mesh.org/latest/install.sh | bash -s -- --local kind
--version
string
Specify Chaos Mesh version (default: latest)
curl -sSL https://mirrors.chaos-mesh.org/latest/install.sh | bash -s -- --version v2.6.0
--runtime
string
Container runtime: docker or containerd (default: docker)
curl -sSL https://mirrors.chaos-mesh.org/latest/install.sh | bash -s -- --runtime containerd
--namespace
string
Installation namespace (default: chaos-mesh)
curl -sSL https://mirrors.chaos-mesh.org/latest/install.sh | bash -s -- --namespace my-chaos
--k3s
boolean
Install in k3s environment
curl -sSL https://mirrors.chaos-mesh.org/latest/install.sh | bash -s -- --k3s
--microk8s
boolean
Install in microk8s environment
curl -sSL https://mirrors.chaos-mesh.org/latest/install.sh | bash -s -- --microk8s

Complete Example

# Install Chaos Mesh v2.6.0 with containerd in custom namespace
curl -sSL https://mirrors.chaos-mesh.org/latest/install.sh | bash -s -- \
  --version v2.6.0 \
  --runtime containerd \
  --namespace chaos-testing \
  --release-name my-chaos-mesh

Method 3: Manual Installation with Manifests

For environments without Helm:
# Step 1: Install CRDs
kubectl create namespace chaos-mesh
kubectl apply -f https://mirrors.chaos-mesh.org/latest/crd.yaml

# Step 2: Install Chaos Mesh components
kubectl apply -f https://mirrors.chaos-mesh.org/latest/chaos-mesh.yaml
Manual installation doesn’t provide easy configuration options. For production, use Helm or the install script.

Configuration Options

Resource Profiles for Chaos Daemon

Chaos Mesh supports three resource profiles to optimize for different environments:

Light (Default)

Resources: 100m CPU, 256Mi memoryUse case: Staging, test environments, cost optimization

Standard

Resources: 250m CPU, 512Mi memoryUse case: Balanced production workloads

Intensive

Resources: 500m CPU, 1Gi memory (with limits)Use case: Heavy chaos testing, large-scale production
# Use standard profile
helm install chaos-mesh chaos-mesh/chaos-mesh \
  --namespace chaos-mesh \
  --set chaosDaemon.resourceProfile=standard

# Override specific resources from profile
helm install chaos-mesh chaos-mesh/chaos-mesh \
  --namespace chaos-mesh \
  --set chaosDaemon.resourceProfile=light \
  --set chaosDaemon.resources.requests.cpu=200m

Namespace Scoped Mode

Restrict Chaos Mesh to specific namespaces instead of cluster-wide:
clusterScoped: false
controllerManager:
  targetNamespace: chaos-testing
helm install chaos-mesh chaos-mesh/chaos-mesh \
  --namespace chaos-mesh \
  --set clusterScoped=false \
  --set controllerManager.targetNamespace=chaos-testing
Namespace-scoped mode is more secure for multi-tenant clusters.

Security Configuration

Enable Dashboard Security Mode

Requires users to provide credentials instead of using service account:
dashboard:
  securityMode: true
Generate a token for access:
# Create a service account
kubectl create serviceaccount chaos-viewer -n chaos-mesh

# Bind appropriate role
kubectl create rolebinding chaos-viewer \
  --clusterrole=chaos-mesh-chaos-dashboard-target-namespace \
  --serviceaccount=chaos-mesh:chaos-viewer \
  -n chaos-mesh

# Get the token
kubectl create token chaos-viewer -n chaos-mesh --duration=24h

Run Chaos Daemon Without Privileged Mode

chaosDaemon:
  privileged: false
  capabilities:
    add:
      - SYS_PTRACE
      - NET_ADMIN
      - NET_RAW
      - MKNOD
      - SYS_CHROOT
      - SYS_ADMIN
      - KILL
      - IPC_LOCK

Filter Namespaces

Only allow chaos injection in annotated namespaces:
controllerManager:
  enableFilterNamespace: true
Then annotate allowed namespaces:
kubectl annotate namespace demo chaos-mesh.org/inject=enabled

High Availability Setup

controllerManager:
  # Run multiple replicas
  replicaCount: 3
  
  # Enable leader election
  leaderElection:
    enabled: true
    leaseDuration: 15s
    renewDeadline: 10s
    retryPeriod: 2s
  
  # Pod anti-affinity
  affinity:
    podAntiAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        - labelSelector:
            matchExpressions:
              - key: app.kubernetes.io/component
                operator: In
                values:
                  - controller-manager
          topologyKey: kubernetes.io/hostname

dnsServer:
  replicas: 3
  affinity:
    podAntiAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
        - weight: 100
          podAffinityTerm:
            labelSelector:
              matchExpressions:
                - key: app.kubernetes.io/component
                  operator: In
                  values:
                    - chaos-dns-server
            topologyKey: kubernetes.io/hostname

Ingress Configuration

Expose the dashboard via Ingress:
dashboard:
  ingress:
    enabled: true
    ingressClassName: nginx
    annotations:
      cert-manager.io/cluster-issuer: letsencrypt-prod
    hosts:
      - name: chaos.example.com
        tls: true
        tlsSecret: chaos-dashboard-tls

Custom Image Registry

Use your own container registry:
images:
  registry: myregistry.example.com
  tag: v2.6.0

imagePullSecrets:
  - name: my-registry-secret

Verification

After installation, verify all components are running:
# Check pod status
kubectl get pods -n chaos-mesh

# Expected output:
NAME                                        READY   STATUS    RESTARTS   AGE
chaos-controller-manager-xxx-yyy            1/1     Running   0          2m
chaos-controller-manager-xxx-zzz            1/1     Running   0          2m
chaos-controller-manager-xxx-aaa            1/1     Running   0          2m
chaos-daemon-xxxxx                          1/1     Running   0          2m
chaos-daemon-yyyyy                          1/1     Running   0          2m
chaos-dashboard-xxxxx                       1/1     Running   0          2m
chaos-dns-server-xxxxx                      1/1     Running   0          2m
Verify CRDs are installed:
kubectl get crds | grep chaos-mesh.org
Expected CRDs:
  • awschaos.chaos-mesh.org
  • azurechaos.chaos-mesh.org
  • blockchaos.chaos-mesh.org
  • dnschaos.chaos-mesh.org
  • gcpchaos.chaos-mesh.org
  • httpchaos.chaos-mesh.org
  • iochaos.chaos-mesh.org
  • jvmchaos.chaos-mesh.org
  • kernelchaos.chaos-mesh.org
  • networkchaos.chaos-mesh.org
  • physicalmachinechaos.chaos-mesh.org
  • podchaos.chaos-mesh.org
  • remotecluster.chaos-mesh.org
  • schedule.chaos-mesh.org
  • stresschaos.chaos-mesh.org
  • timechaos.chaos-mesh.org
  • workflow.chaos-mesh.org

Upgrading Chaos Mesh

# Update repository
helm repo update

# Upgrade to latest version
helm upgrade chaos-mesh chaos-mesh/chaos-mesh \
  --namespace chaos-mesh \
  --values values-production.yaml

# Upgrade to specific version
helm upgrade chaos-mesh chaos-mesh/chaos-mesh \
  --namespace chaos-mesh \
  --version v2.6.0 \
  --values values-production.yaml
Always test upgrades in a non-production environment first. Check the release notes for breaking changes.

Uninstallation

# Uninstall Chaos Mesh
helm uninstall chaos-mesh --namespace chaos-mesh

# Delete CRDs (optional)
kubectl delete crd $(kubectl get crd | grep chaos-mesh.org | awk '{print $1}')

# Delete namespace
kubectl delete namespace chaos-mesh
Deleting CRDs will also delete all chaos experiments. Ensure you have backups if needed.

Troubleshooting

Cause: Incorrect runtime or socket path configurationSolution:
  1. Check your container runtime:
    kubectl get nodes -o wide
    
  2. Verify the socket path exists on nodes
  3. Reinstall with correct runtime settings:
    helm upgrade chaos-mesh chaos-mesh/chaos-mesh \
      --namespace chaos-mesh \
      --set chaosDaemon.runtime=containerd \
      --set chaosDaemon.socketPath=/run/containerd/containerd.sock
    
Cause: Webhook certificate issues or RBAC problemsSolution:
  1. Check controller logs:
    kubectl logs -n chaos-mesh -l app.kubernetes.io/component=controller-manager
    
  2. Verify webhook configuration:
    kubectl get validatingwebhookconfigurations
    kubectl get mutatingwebhookconfigurations
    
  3. Delete and recreate webhook certificates:
    kubectl delete secret chaos-mesh-webhook-certs -n chaos-mesh
    kubectl rollout restart deployment chaos-controller-manager -n chaos-mesh
    
Cause: Service not exposed or port-forward issuesSolution:
  1. Verify dashboard pod is running:
    kubectl get pods -n chaos-mesh -l app.kubernetes.io/component=chaos-dashboard
    
  2. Check service:
    kubectl get svc chaos-dashboard -n chaos-mesh
    
  3. Use correct port-forward command:
    kubectl port-forward -n chaos-mesh svc/chaos-dashboard 2333:2333
    
Possible causes:
  • Label selectors don’t match target pods
  • Namespace filtering is enabled
  • Target namespace doesn’t have proper annotations
  • Chaos daemon not running on target nodes
Debug steps:
# Verify label selectors match
kubectl get pods -n <namespace> --show-labels

# Check chaos daemon is on all nodes
kubectl get pods -n chaos-mesh -l app.kubernetes.io/component=chaos-daemon -o wide

# Describe the chaos experiment
kubectl describe <chaos-kind> <name> -n <namespace>

# Check if namespace filtering is enabled
helm get values chaos-mesh -n chaos-mesh
Cause: Insufficient RBAC permissions or privileged mode disabled without proper capabilitiesSolution:
  1. For RBAC issues, ensure service accounts have proper roles:
    kubectl get clusterrolebindings | grep chaos-mesh
    
  2. For capabilities issues, enable privileged mode:
    helm upgrade chaos-mesh chaos-mesh/chaos-mesh \
      --namespace chaos-mesh \
      --set chaosDaemon.privileged=true
    

Next Steps

Run Your First Experiment

Follow the quick start guide to create your first chaos experiment

Security Best Practices

Learn about RBAC, security modes, and production hardening

Monitoring Integration

Set up Prometheus and Grafana for chaos experiment observability

Advanced Configuration

Explore workflows, schedules, and complex experiment orchestration