Wednesday, February 23, 2011

VMware HA and Cisco OTV

So we're moving to 2 new data-centers in the next little while, from our existing 2. However the new ones will eventually be active/active, once we get Ontap 8.x whichever one supports metronet clusters.

So in designing our new HA architecture, I had to take into account the fact that we're using HP blades and OTV from cisco, which allows us to have a flat network and have VMs that are portable across the 2 data-centers.

Which seems like a great idea. We'll have cisco 7000 10G switches connecting the ESX hosts in each location, and the 6509s will handle the OTV domain ownership since they control the links between locations as well. Links are redundant, ESX clusters have capacity in both locations, and eventually we'll have metronet clusters for the filers, so storage will be available in both places.

HOWEVER, I was talking about the OTV vlans to our network team today. Seems that in the event of a datacenter outage, an OTV domain owned by the switch in the failed datacenter would be unavailable everywhere, even on the 7000s in the opposite DC. So the HA response for ESX would be to shut everything down, because its heartbeats would fail everywhere.

The solution is having another service console network for the whole cluster, and have its OTV domain be owned by the 6509 in its DC. That way in the event of a DC failure, those hosts could still communicate on their secondary SC network, and they wouldn't fail, and as well they would properly power on the VMs from the other DC, which is what we want.

And we have to be sure we add hosts in a specific order to the HA cluster. The first 5 are HA masters, so we need at least 1 in each location, preferably 2 - so adding 3 from one DC and then 2 from the other to the HA cluster is important.

2 comments:

CS said...

You wrote that a 6509 is going to have something to do with OTV.

Are you sure ?

Unknown said...

I thought only Nexus 7K can participate in OTV??