A firewall policy is a living document that only ever grows unless something forces it to shrink. Recertification is that forcing function: a recurring, structured review where every rule must re-earn its place. This note describes a methodology that works across vendors and orchestration platforms (I've run variations of it with Tufin and AlgoSec, but nothing here is tool-specific).
Why rule bases rot
Rules accumulate for good reasons in the moment — a project, an incident, a migration — and then outlive their justification. The person who knew why a rule existed moves on. A "temporary" allow becomes permanent because nobody is sure it's safe to remove. Over years, the policy becomes an archaeological record nobody can fully read, full of rules that are redundant, shadowed, overly broad, or simply dead.
The recertification cycle
The method is a loop, run on a schedule (quarterly is common; high-sensitivity zones more often):
1. Attribute every rule
Each rule needs an owner, a justification, and a review date. Going forward this is enforced at change time — no new rule lands without it. For the existing base, attribution is a one-time archaeology project, and it's worth doing: an unattributable rule is itself a finding.
2. Pull usage data
Instrument the policy so you know which rules actually match traffic. A rule that hasn't matched a packet in the entire review window is a prime decommission candidate. This single data point converts the scariest part of the job — deletion — from a guess into an evidence-based decision.
3. Analyze for hygiene
Orchestration tooling flags the structural problems automatically:
- Shadowed rules — never reached because an earlier rule matches first.
- Redundant rules — fully covered by another rule.
- Overly permissive rules — any-any or wide ranges that could be tightened.
- Expired rules — past their review date with no re-justification.
4. Route to owners for re-justification
Each flagged rule goes to its owner with a simple question: is this still needed, and why? Re-justification resets the review clock. Silence or "I don't know" routes the rule to decommissioning. The burden of proof sits with keeping the rule, not removing it — that's the inversion that makes the whole thing work.
5. Decommission safely and reversibly
Removal goes through the same change workflow as addition: designed, recorded, and reversible. A common safe pattern is to disable before delete — turn the rule off, wait a full business cycle, confirm nothing broke, then remove. If something does break, re-enabling is instant and the blast radius is known.
Why automate it
Done by hand, recertification is so tedious it gets skipped, which is how you got here. Orchestration platforms exist to make it sustainable: they hold the attribution metadata, collect the usage data, run the hygiene analysis, and drive the owner-review workflow. The human decisions stay human; the mechanical toil gets automated. That's the same principle I'd apply to any security automation — see the broader argument in policy orchestration.