Documentation Index
Fetch the complete documentation index at: https://mintlify.com/chaos-mesh/chaos-mesh/llms.txt
Use this file to discover all available pages before exploring further.
Overview
Chaos Mesh exposes Prometheus metrics for monitoring chaos experiments, workflows, schedules, and system health. These metrics enable observability, alerting, and performance analysis of your chaos engineering practice.Metrics Components
Chaos Mesh exposes metrics from three main components:- Chaos Controller Manager - Experiment and orchestration metrics
- Chaos Dashboard - API and UI request metrics
- Chaos Daemon - Agent-level injection metrics
Controller Manager Metrics
The controller manager exposes metrics on port10080 at /metrics endpoint.
Experiment Metrics
Total number of chaos experiments and their current phases.Labels:
namespace: Experiment namespacekind: Chaos type (PodChaos, NetworkChaos, etc.)phase: Current phase (Running, Paused, Finished, Failed)
Workflow Metrics
Total number of workflows by namespace.Labels:
namespace: Workflow namespace
Schedule Metrics
Total number of active schedules by namespace.Labels:
namespace: Schedule namespace
Sidecar Injection Metrics
Total number of sidecar injections performed via the webhook.Labels:
namespace: Target pod namespaceconfig: Injection configuration name
Total number of injection requests received.Labels:
namespace: Target namespaceconfig: Configuration name
Template Metrics
Total number of injection templates configured.
Total number of configuration templates.Labels:
namespace: Template namespacetemplate: Template name
Total number of active injection configs.Labels:
namespace: Config namespacetemplate: Associated template
Error Metrics
Total template not found errors.Labels:
namespace: Request namespacetemplate: Missing template name
Total template rendering failures.
Total configuration name duplication errors.Labels:
namespace: Config namespaceconfig: Duplicate config name
Event Metrics
Total number of Kubernetes events emitted by the controller.Labels:
type: Event type (Normal, Warning)reason: Event reason codenamespace: Event namespace
Dashboard Metrics
The dashboard exposes metrics on the same port as the API (default2333) at /metrics.
HTTP request latency histogram for dashboard API endpoints.Labels:
path: API endpoint pathmethod: HTTP method (GET, POST, PUT, DELETE)status: HTTP status code
Scraping Configuration
ServiceMonitor (Prometheus Operator)
Prometheus Configuration
For standalone Prometheus without the operator:Example Queries
Experiment Monitoring
Workflow Monitoring
Schedule Monitoring
Dashboard Performance
Injection Monitoring
Grafana Dashboard
Create comprehensive Grafana dashboards to visualize Chaos Mesh metrics:Create Panels
Recommended panels:
- Active experiments by type (pie chart)
- Experiment timeline (time series)
- Workflow execution count (gauge)
- Schedule execution rate (graph)
- Dashboard API latency (heatmap)
- Error rate by type (graph)
Example Panel Queries
Alerting Rules
Define Prometheus alerting rules for chaos operations:Best Practices
Set Appropriate Scrape Intervals
Set Appropriate Scrape Intervals
- Controller metrics: 30-60 seconds
- Dashboard metrics: 15-30 seconds
- Balance observability with cardinality
Use Recording Rules
Use Recording Rules
Pre-compute frequently used queries:
Monitor Metric Cardinality
Monitor Metric Cardinality
Track unique label combinations to prevent cardinality explosion, especially with namespace and kind labels.
Correlate with Application Metrics
Correlate with Application Metrics
Combine Chaos Mesh metrics with application performance metrics to measure experiment impact.
Troubleshooting
Metrics Not Appearing
Metrics Not Appearing
Check metrics endpoint:Verify ServiceMonitor:
Missing Labels
Missing Labels
Some metrics may be missing if no resources of that type exist. Create a test experiment to populate metrics.
High Cardinality
High Cardinality
If experiencing high cardinality issues, consider:
- Reducing namespace diversity
- Aggregating less frequently used labels
- Using recording rules
Next Steps
Dashboard
Visualize metrics through the Chaos Dashboard
Workflows
Monitor complex workflow executions
Scheduling
Track scheduled experiment metrics
Status Checks
Validate experiments with health checks