Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/chaos-mesh/chaos-mesh/llms.txt

Use this file to discover all available pages before exploring further.

Chaos Mesh is an open source cloud-native Chaos Engineering platform that offers various types of fault simulation and has an enormous capability to orchestrate fault scenarios.

What is Chaos Mesh?

Chaos Mesh enables you to conveniently simulate various abnormalities that might occur in reality during development, testing, and production environments. By intentionally introducing controlled failures, you can discover potential problems in your system before they occur in production.
Chaos Mesh is a Cloud Native Computing Foundation (CNCF) incubating project, which demonstrates its maturity and community support.

Key Features

Multiple Fault Types

Support for pod failures, network chaos, I/O faults, time skew, stress testing, and more

Web Dashboard

Intuitive UI for designing, managing, and monitoring chaos experiments

Workflow Orchestration

Define complex fault scenarios with serial and parallel execution

Kubernetes Native

Built on Kubernetes CRDs for seamless integration with your cluster

Architecture Components

Chaos Mesh consists of three main components:

Chaos Controller Manager

The core component responsible for scheduling and managing Chaos experiments. It contains several CRD Controllers:
  • Workflow Controller: Orchestrates complex chaos scenarios
  • Scheduler Controller: Manages scheduled chaos experiments
  • Fault Type Controllers: Handle specific chaos types (PodChaos, NetworkChaos, IOChaos, etc.)

Chaos Daemon

Runs as a DaemonSet on each node with privileged permissions (configurable). The daemon:
  • Interferes with network devices, file systems, and kernels
  • Accesses target Pod namespaces to inject faults
  • Communicates with the Controller Manager via gRPC

Chaos Dashboard

A web-based UI that provides:
  • Visual experiment design and creation
  • Real-time experiment monitoring
  • Experiment history and analytics
  • RBAC and security controls

Supported Chaos Types

Chaos Mesh supports a comprehensive set of fault injection types:
Chaos TypeDescription
PodChaosPod lifecycle faults (kill, failure, container kill)
NetworkChaosNetwork faults (delay, loss, duplicate, corrupt, partition, bandwidth)
IOChaosFile system I/O faults (latency, errno, attribute override)
TimeChaosTime skew simulation without affecting other containers
StressChaosCPU and memory stress testing
DNSChaosDNS resolution failures and errors
HTTPChaosHTTP request/response manipulation
KernelChaosKernel-level fault injection
JVMChaosJVM application fault injection
AWSChaosAWS service fault simulation
GCPChaosGCP service fault simulation
AzureChaosAzure service fault simulation
PhysicalMachineChaosFaults on physical or VM machines
BlockChaosBlock device I/O faults

How It Works

Chaos Mesh uses Kubernetes CustomResourceDefinitions (CRD) to define chaos objects.
1

Define the Chaos Experiment

Create a YAML manifest specifying the chaos type, target selector, and fault parameters
2

Apply to Kubernetes

Use kubectl apply to submit the chaos experiment to your cluster
3

Controller Manages Lifecycle

The Chaos Controller Manager processes the CRD and coordinates with Chaos Daemon
4

Daemon Injects Fault

Chaos Daemon executes the fault injection on the target pods/nodes
5

Monitor Results

Observe the experiment through the Dashboard or kubectl

Use Cases

Development & Testing

  • Validate error handling and retry logic
  • Test service mesh resilience (circuit breakers, retries, timeouts)
  • Verify monitoring and alerting configurations

Production Readiness

  • Conduct game days and disaster recovery drills
  • Test autoscaling behavior under stress
  • Validate backup and recovery procedures

Continuous Validation

  • Integrate chaos experiments into CI/CD pipelines
  • Automated regression testing for resilience
  • SLO/SLA validation under various failure scenarios

Security Considerations

Chaos Mesh requires privileged access to inject faults effectively:
By default, Chaos Daemon runs with privileged permissions. In production environments, consider:
  • Using namespace-scoped mode instead of cluster-scoped
  • Enabling security mode on the Dashboard
  • Restricting chaos experiments to specific namespaces
  • Using RBAC to control who can create chaos experiments
You can disable privileged mode and use specific Linux capabilities:
chaosDaemon:
  privileged: false
  capabilities:
    add:
      - SYS_PTRACE
      - NET_ADMIN
      - NET_RAW
      - MKNOD
      - SYS_CHROOT
      - SYS_ADMIN
      - KILL
      - IPC_LOCK

Community & Support

Chaos Mesh has an active community:

Next Steps

Quick Start

Install Chaos Mesh and run your first experiment in minutes

Installation Guide

Detailed installation instructions for production environments