Network Visibility at a Utility — singleventupset.com

These are field notes from leading security operations engineering at a large electric and gas utility. I'm deliberately keeping specifics generic — none of this is environment-detail that belongs on a public page — but the shape of the problem is common enough that the approach is worth writing down.

A utility environment is complex in two directions at once: technically, because it spans corporate IT, field systems, and the boundary toward operational technology; and organizationally, because change is slow, scrutinized, and regulated. You don't get to "move fast and break things." Visibility has to be built carefully, and it has to be defensible.

Visibility is a prerequisite, not a feature

You cannot defend what you cannot see, and you cannot see what your tools never receive. Most detection gaps I've chased back to root cause were not detection-logic problems — they were plumbing problems. The sensor was deployed, the rules were tuned, but the traffic it needed never reached it, or reached it duplicated, or reached it stripped of the very fields the rule keyed on.

So the first job is the unglamorous one: get the right copy of the right traffic to the right tool, reliably, without becoming a single point of failure for the production network you're trying to protect.

The packet-broker fabric

A packet broker (in my case, a Gigamon fabric) sits between the network's TAP/SPAN sources and the security tool rail. It aggregates, filters, de-duplicates, and load-balances traffic out to consumers — IDS/NDR, full-packet capture, DLP, performance monitoring. The value is decoupling: the network team manages the network, the tools team manages the tools, and the broker is the contract between them.

why a broker beats point-to-point SPAN

SPAN sessions are finite, they get reprioritized out from under you during incidents, and they silently drop under load. A broker gives you a stable, auditable map of which sources feed which tools — and lets you add a new tool without renegotiating switch config across the estate. See TAP vs SPAN for the longer argument.

Practical lessons from running one at scale:

De-duplicate early. In a meshed environment the same flow is tapped in multiple places. If you don't dedupe at the broker, you pay for it in tool license, storage, and false correlation.
Filter to relevance, not to convenience. It's tempting to drop "noisy" traffic to save a tool. Document every drop rule as if a future incident responder will curse you for the gap — because one will.
Treat the tool rail as capacity-planned infrastructure. Egress to a 10G tool port is a budget. Oversubscribe it and you get silent loss, which is the worst kind.

Edge DDoS defense

At the network edge, volumetric and protocol-abuse attacks get handled before they reach anything stateful. An edge defense appliance (Arbor, in my case) does detection and mitigation at the boundary — the goal is to absorb or scrub the obvious volumetric stuff so that your firewalls, load balancers, and application tier never see it.

The engineering reality is that edge defense is mostly about baselines and pre-authorization. Mitigation that requires a human to approve a countermeasure at 3 a.m. during an active flood is mitigation that arrives too late. The work is done up front: establishing what normal looks like per service, and deciding in advance which countermeasures are safe to trigger automatically.

SIEM / SOAR: where it all lands

Detections from the NDR tools, edge defense events, and infrastructure logs converge in a SIEM, and repeatable response is codified in SOAR playbooks. The temptation with SOAR is to automate everything; the discipline is to automate the boring, high-confidence, reversible actions and leave judgment calls to people. Enrichment, ticket creation, containment of a known-bad indicator — automate. "Take this production segment offline" — that stays human.

The foundation: accurate technical inventory

None of the above works on top of a bad inventory. If you don't know what's on the network, your baselines are fiction and your detections have blind spots by construction. A disproportionate amount of the durable wins I've seen came from the least exciting work: reconciling the technical inventory, monitoring it for drift, and making "we know what's here" a continuously-true statement rather than a quarterly audit fire drill.

takeaway

Tools get the headlines; plumbing and inventory win the incidents. Build the fabric, make the drops auditable, pre-authorize the safe mitigations, and keep the inventory honest. Everything downstream gets easier.