Quick tactic: how to make sure OPA doesn’t crash your cluster

Where this fits in K8s strategy

Stop unresponsive OPA webhooks from causing false loops in managed K8s services to keep replacing nodes

Why it’s important

The constant loop of trying to replace nodes can cause clusters to crash. We don’t want that, do we?

Problem

Open Policy Agent (OPA) is a useful admission controller for Kubernetes. A lot of nasty things can enter your cluster if it’s not in front as a bouncer.

But it needs to be configured properly to do the job. It can trigger all kinds of issues otherwise. After all, webhooks are a single point of failure.

This story highlights the problem faced in a multi-tenant GKE cluster that was using OPA.

Solution

Call me Captain Obvious, but you need to validate that you’ve configured OPA correctly. Here are some methods you can follow from trusted sources:

Leave a Comment