VCF 9.0 Best Practices: The Network View

As part of the Tech Field Day presentations around VMware Cloud Foundation 9.0, here’s some analysis and perspective on “VMware Best Practices for Adopting/Deploying VCF 9.0”. It provides some complementary info to my earlier “VCF 9.0: What’s New” post, and as always, I applied a simple filter for this analysis: I focus on items that matter to a network operator who has to make this thing run on real links, real switches, and real people’s schedules. No webinar play-by-play here, just the questions I asked, the answers I heard, and the network-adjacent topics you should be aware of.

What I Learned

Latency Budgets: Are they Hard Requirements?

I asked for clarity on the round-trip number discussion. The guidance is ≤100 ms RTT between a workload domain and its vCenter/hosts, and ≤500 ms RTT between an Ops Collector and the central Operations cluster. Those are operational limits, not hand-wavey suggestions. Also notable: the Ops Collector now handles log forwarding and can buffer metrics/logs locally during outages. Translation: collectors aren’t just passive data vacuums anymore; they’re in the data path.

On FIPS 140-3 Status

FIPS 140-3 is a foundational security certification for use in US Federal Government IT, and I asked if FIPS 140-3 has been achieved. The short answer: the new vCenter turns FIPS on by default during upgrades to VCP 9.0, but we didn’t get closure on the issue during the webinar. A quick search on the NIST Cryptographic Module Validation Program (CMVP) website shows that most VMware modules are FIPS 140-2 and -3 compliant, but keep an eye on this as the move to 140-3 becomes the minimum requirement. Given the wide adoption of VMware products in IT infrastructure across many agencies, I would be very surprised if VMware didn’t proceed toward 140-3 on all relevant products.

Field Lessons Since VCF GA

I asked what customers are actually doing regarding wholesale upgrades. The theme: very few “upgrade-everything-in-place” stories; most are greenfield or greenfield + imports. Also, the misconception that you must buy new hardware and run vSAN for the management domain is now officially outdated: FC and NFS are supported for management in VCF 9.0.

On Topology, Stretch, and DR

I’ll be very direct here: I’ve never liked stretching a LAN over the WAN. The “L” in “LAN” is for LOCAL, and there are apps and systems that make the unspoken assumption that all the resources are very close together. Assumptions on network latency above are not always met, and I’ve seen this cause too many issues in my experience.

The reality today is that this is done routinely (nobody asked me – hmph), and you need to understand the impacts the underlay and associated latency characteristics will have on your apps. VCF 9.0 offers multiple design patterns, but your underlay decides what’s sane:

Site HA can run without NSX stretch if you provide L2 stretch in the underlay. If you prefer overlays, NSX can still be your elasticity layer; just be deliberate and intentional.
DR across regions is different: Operations replicates; Automation runs on a Kubernetes backend and has its own recovery model. Understand the knobs before you declare an “easy DR.”
Metro clustering for the management domain is now supported. Some pieces aren’t fully automated; be ready to assemble and integrate.

Operator note: Don’t conflate HA and DR. They’re different failure/recovery stories with different packet journeys.

Latency Budgets and Collector Placement

Those 100 ms/500 ms numbers will design your fleet for you. If you span oceans, place collectors local to the workloads and keep the central Operations plane where the RTT makes sense. Global visibility is great; global coupling is fragile.

Addressing, DNS, and IP Portability

Decide how you’ll move services across sites:

IP portability strategy: NSX, underlay L2, routed anycast VIPs—pick one and document it.
Day-0 placement for Ops/Automation: put them on separate VLANs/NSX segments if DR/tenancy is in your future. Renumbering later is… not fun. Never. Ever.
Expect different IP/DNS counts for simple vs HA builds. Automation uses an embedded LB (external optional). Operations can use an LB but doesn’t require one.

Logs and Telemetry are Now “Real” Traffic

Because the Ops Collector also forwards logs, plan ACLs, bandwidth, and QoS for sustained telemetry plus bursty catch-up after outages. If you’ve never sized links for observability traffic before, now’s the time to start.

This is not just a VCF 9.0 issue; there’s great power in the ability to generate lots of observability data, but it can also cause your own data lake challenge just for network telemetry. This could bring Data Engineering back as an area of specialization just within networking, and AI is already proving to be a useful tool to help reduce telemetry and observability data. This will be a fascinating issue to watch closely in the next 2-3 years.

A Rollout Strategy I Can Defend to Ops

Pilot a fleet in one region on storage you already operate well (FC, NFS, or vSAN). Keep it small, but production-grade.
Treat identity, licensing, backups, cert rotation, and password policies as Day-0 work, not Day-2 debt. Planning ahead always pays off.
Import one low-risk workload domain and run the full lifecycle drill: patching, certs, password rotation, rollback.
Prove HA/DR with real failover/failback and measure the RTT budgets you said you’d meet.
Scale by adding collectors, growing domains, or standing up a second fleet if object/metric ceilings or latency say it’s time.

Bottom Line

VCF 9.0 gives operators something many of us value more than shiny features: increased design agility. You can start simple on the networks you already trust, layer in HA or DR when the business actually needs it, and place control-plane components where the latency budgets are real, not aspirational. If you come from the network side, VCF 9.0 doesn’t remove the hard parts, but it does make the good choices easier to live with.

The VMware Cloud Foundation 9.0 Showcase: Powering the Modern Private Cloud was presented by VMware in association with Techstrong and Tech Field Day. The videos will be posted to the Tech Field Day YouTube channel and on the website. You can learn more about VMware Cloud Foundation 9.0 on the VMware website.

VCF 9.0 Best Practices: The Network View

What I Learned

Bottom Line

SHARE THIS STORY

FOLLOW US

VCF 9.0 Best Practices: The Network View

What I Learned

Bottom Line

TECHSTRONG TV

Tech Field Day Events

SHARE THIS STORY

RELATED STORIES:

FOLLOW US

NEWSLETTER SIGN UP