One of the major new features within Exchange 2010 is the Database Availabilty Group (DAG). This replaces High Availability options from previous versions of Exchange such as SCR and CCR – it essentially works by having multiple copies of the same Exchange databases replicated across multiple Exchange servers.
Exchange 2010 is supported on hardware virtualisation platforms provided the conditions in this Technet article are met. Whilst looking at various options for a possible Exchange 2010 deployment for a user base in the hundreds (it obviously made sense to look at what possibilities are available if deciding to virtualise the mailbox server role) I stumbled across this blog post which suggested that whilst Exchange 2010 was supported as virtual, when running a DAG it was not supported if part of a virtualised cluster.
Given that most people looking to deploy Exchange 2010 DAG solutions virtually, may well already have an existing cluster and do not wish to purchase standalone virtual hosts just for this solution it seemed a bit dissapointing to read that Microsoft had taken this stance. I contacted MS support to clarify exactly what was and wasn’t supported.
Whilst going back and forth with the support guy I read this Technet Magazine Article which suggested that the above stance might have changed recently and in fact a DAG would be supported within a virtualised cluster, provided that all virtualisation HA features would be disabled for the Exchange 2010 DAG VMs. This seemed to reflect the below quote from the virtualisation support article
“DAGs are supported in hardware virtualization environments provided that the ….. clustered root servers have been configured to never failover or automatically move mailbox servers that are members of a DAG to another root server.”
Eventually they confirmed that this was correct and whilst it would be supported to deploy a DAG in a virtualised cluster with these features turned off (HA and DRS in VMware ESX) there was a strong emphasis that this would not be recommended. Exactly why it was not recommended was difficult to ascertain, the impression I got from the call was:
- As Aidan mentions in his blog post he speculates that it hasn’t been tested enough to be recommended yet. There was also a lot of emphasis on the call on large scale deployments and not much demand yet for testing smaller deployment scenarios.
- Fixing VMs to particular hosts does mean that the hosts themselves become a management burden, it was mentioned that this could be a high cost.
Whilst it does increase the level of management, purchasing modern physical hardware to run an Exchange 2010 DAG could mean that the servers are not highly utilised for smaller deployments – consequently the hardware cost of doing so can be comparatively high to the cost of a VM.
The main purpose in this deployement to consider virtualising Exchange was hardware cost, not the HA features it could bring – Exchange itself will bring application HA.
The upshot of this post is essentially that an Exchange 2010 DAG in a virtualised cluster is supported by Microsoft provided the HA features are turned off, but not recommended by them. Consequently you can take that information into your design process and consider if it makes sense for your deployment.
If running VMware virtualisation, you must also consider that prior to vSphere 4.0 U1 running MSCS clusters within HA/DRS clusters was not supported, this changed with the release of U1. Since an Exchange 2010 DAG relies on Windows Failover Clustering then you must be on at least U1 to be supported – however, again HA / DRS must be disabled for the VMs in question. Virtual Kenneth has some very useful information about this in a blog post.
Seems like enough customers put significant pressure on Microsoft so that they have changed their stance on this issue. As of the post from 16th May on the Exchange Team blog site HA and DRS are now supported for Exchange 2010 DAG clusters, provided you are running Exchange 2010 SP1. Be careful though, as Matt Liebowitz helpfully points out you need to consider your cluster heartbeat timeouts to allow for a brief network interruption of vMotion.
He also posts other helpful information on this issue here.