Documentation Index
Fetch the complete documentation index at: https://mintlify.com/chaos-mesh/chaos-mesh/llms.txt
Use this file to discover all available pages before exploring further.
Overview
StatusCheck provides automated health validation during chaos experiments. It enables you to verify that your system remains functional or meets specific criteria while chaos is being injected, making experiments safer and more informative.Use Cases
Experiment Validation
Verify application endpoints remain responsive during chaos injection
SLA Monitoring
Ensure service level objectives are maintained under fault conditions
Continuous Health Check
Monitor system health throughout workflow execution
Failure Detection
Automatically detect when experiments cause unacceptable degradation
StatusCheck Types
HTTP Status Check
Currently, Chaos Mesh supports HTTP-based status checks to validate endpoint availability and response codes.Execution Modes
Defines how the status check executes.Synchronous: Exits immediately after success or failure threshold is reached.Continuous: Continues checking until duration expires or failure threshold is exceeded.
Synchronous Mode
Best for validating a specific condition before proceeding:Continuous Mode
Best for ongoing monitoring during experiments:Configuration Parameters
Type of status check to perform. Currently only
HTTP is supported.Maximum duration for the status check execution.Format: Duration string (e.g., “5m”, ”30s”, “1h30m”)
- Synchronous: Maximum time to wait for success threshold
- Continuous: Total monitoring duration
Timeout in seconds for each individual status check execution. Must be ≥ 1.
Interval in seconds between status check executions. Must be ≥ 1.
Minimum consecutive failures before status check is considered failed. Must be ≥ 1.When exceeded, the status check terminates with failure status.
Minimum consecutive successes before status check is considered successful. Must be ≥ 1.Only applies to Synchronous mode. When exceeded, the check terminates with success status.
Number of status check execution records to retain. Range: 1-1000.Controls memory usage and history depth.
HTTP Status Check Configuration
Full URL to check, including protocol and path.Examples:
http://my-service:8080/healthhttps://api.example.com/statushttp://10.0.0.1:3000/ready
HTTP method to use. Supported values:
GET, POSTHTTP headers to include in the request.Example:
Request body for POST requests.Example:
Expected HTTP status code(s).Formats:
- Single code:
"200" - Range (inclusive):
"200-299" - Multiple codes: Use multiple checks or ranges
"200"- Exact match"200-204"- Range including both endpoints"2xx"- Not supported, use"200-299"
Status Check in Workflows
Status checks are most powerful when integrated into workflows:When used in a workflow template, determines whether to abort the entire workflow if the status check’s failure threshold is exceeded.
true: Workflow is aborted on check failurefalse: Status check failure is recorded but workflow continues
Status and Conditions
Current state of the status check.
Advanced Examples
POST Request with Headers
Status Range Validation
High-Frequency Monitoring
Monitoring Status Checks
View Status Check Details
Check Conditions
Best Practices
Set Realistic Thresholds
Set Realistic Thresholds
Configure failure thresholds based on your application’s expected behavior:
- Transient failures: Higher threshold (e.g., 5-10)
- Critical endpoints: Lower threshold (e.g., 2-3)
Choose Appropriate Intervals
Choose Appropriate Intervals
Balance between responsiveness and overhead:
- Critical checks: 1-5 seconds
- Standard checks: 10-30 seconds
- Long-running experiments: 30-60 seconds
Use Continuous Mode for Workflows
Use Continuous Mode for Workflows
In parallel workflow executions, use Continuous mode to monitor health throughout the entire chaos injection period.
Configure Timeouts Carefully
Configure Timeouts Carefully
Set
timeoutSeconds lower than intervalSeconds to prevent overlapping checks.Limit Record History
Limit Record History
For long-running or high-frequency checks, adjust
recordsHistoryLimit to manage memory usage.Troubleshooting
Status Check Always Fails
Status Check Always Fails
Check URL accessibility:Verify timeout settings:
- Ensure
timeoutSecondsis sufficient for response time - Check network latency to the target service
Timeout Issues
Timeout Issues
Increase timeout if legitimate responses take longer:
Status Code Mismatch
Status Code Mismatch
Check actual response:Use ranges for flexibility:
Future Enhancements
Status checks may be extended to support:- Response body validation
- Custom command execution
- Kubernetes resource checks
- Prometheus query validation
- gRPC health checks
Next Steps
Workflows
Integrate status checks into complex workflows
Monitoring
Track status check metrics with Prometheus
Dashboard
View status check execution through the UI
Scheduling
Combine status checks with scheduled experiments