GCPChaos allows you to simulate Google Cloud Platform infrastructure failures by manipulating Compute Engine instances and persistent disks through the GCP API.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/chaos-mesh/chaos-mesh/llms.txt
Use this file to discover all available pages before exploring further.
Actions
GCPChaos supports the following actions:- node-stop: Stop a Compute Engine instance
- node-reset: Reset (hard reboot) a Compute Engine instance
- disk-loss: Detach persistent disks from an instance
Spec Fields
The GCP chaos action to perform.Options:
node-stop, node-reset, disk-lossDefault: node-stopGCP project ID where the resources are located.
GCP zone where the instance is located (e.g.,
us-central1-a, europe-west1-b).Name of the Compute Engine instance to target.
List of disk device names to detach. Required when action is
disk-loss.Example: ["disk-1", "disk-2"]Duration of the chaos action. For
node-stop and disk-loss, resources are affected for this duration then recovered. Not applicable to node-reset (oneshot action).Name of the Kubernetes secret containing GCP service account credentials. If not specified, uses the default GCP credential chain.
Remote cluster name where the chaos will be deployed.
GCP Credentials Setup
You need to provide GCP credentials to Chaos Mesh.Create Service Account Credentials
- Create a service account in your GCP project
-
Grant the service account the following roles:
Compute Instance Admin (v1)or custom role with these permissions:compute.instances.stopcompute.instances.startcompute.instances.resetcompute.instances.getcompute.instances.attachDiskcompute.instances.detachDisk
- Create and download a JSON key file
Create Kubernetes Secret
Alternative: Workload Identity (GKE)
If running on GKE, you can use Workload Identity:- Enable Workload Identity on your cluster
- Create a service account with necessary permissions
- Bind the Kubernetes service account to the GCP service account
- Don’t specify
secretNamein the GCPChaos spec
Examples
Stop Compute Engine Instance
Reset Compute Engine Instance
Detach Persistent Disks
Implementation Details
GCPChaos uses the GCP Compute Engine API to:- Authenticate using service account credentials or Workload Identity
- Call GCP APIs to manipulate resources:
node-stop: Calls Stop API, then Start after durationnode-reset: Calls Reset API (oneshot)disk-loss: Calls DetachDisk API, stores attachment info in status, then AttachDisk after duration
api/v1alpha1/gcpchaos_types.go:44-108
Status Tracking
For thedisk-loss action, GCPChaos stores the original disk attachment information in the status:
api/v1alpha1/gcpchaos_types.go:110-121
Oneshot Behavior
Thenode-reset action is marked as a oneshot action, meaning:
- It executes once immediately
- No recovery action is performed
- The
durationfield is ignored - The experiment completes after the reset command is sent
api/v1alpha1/gcpchaos_types.go:28
Important Notes
- Ensure your GCP credentials have appropriate permissions
- Be cautious when targeting production instances
- The instance must be in a state that allows the requested operation
- For
disk-loss, ensure you’re not detaching the boot disk - Test in non-production environments first
- GCP API rate limits apply