So in designing our new HA architecture, I had to take into account the fact that we're using HP blades and OTV from cisco, which allows us to have a flat network and have VMs that are portable across the 2 data-centers.
Which seems like a great idea. We'll have cisco 7000 10G switches connecting the ESX hosts in each location, and the 6509s will handle the OTV domain ownership since they control the links between locations as well. Links are redundant, ESX clusters have capacity in both locations, and eventually we'll have metronet clusters for the filers, so storage will be available in both places.
HOWEVER, I was talking about the OTV vlans to our network team today. Seems that in the event of a datacenter outage, an OTV domain owned by the switch in the failed datacenter would be unavailable everywhere, even on the 7000s in the opposite DC. So the HA response for ESX would be to shut everything down, because its heartbeats would fail everywhere.
The solution is having another service console network for the whole cluster, and have its OTV domain be owned by the 6509 in its DC. That way in the event of a DC failure, those hosts could still communicate on their secondary SC network, and they wouldn't fail, and as well they would properly power on the VMs from the other DC, which is what we want.
And we have to be sure we add hosts in a specific order to the HA cluster. The first 5 are HA masters, so we need at least 1 in each location, preferably 2 - so adding 3 from one DC and then 2 from the other to the HA cluster is important.