Biggest Disappointment About vSphere 5.5 New Feature AppHA (Application High Availability) by Veeam
Just read the following in the latest Veeam Community Forums Digest and it’s quite interesting.
In fact, I use a much simpler method in Windows environment, I simply set the particular services to restart by itself should there be any failure, it worked perfectly so far, no hassle at all.
You may remember after sorting through all of the vSphere 5.5 features a few weeks ago; I was most excited for the vSphere AppHA (Application High Availability). Well, I have to admit it turned into my biggest disappointment based on some hands-on experience.
The theory behind this feature sounded excellent: in addition to vSphere HA (high availability) that VMware provided for a few years now (VM monitoring, with automatic VM restart after VM or host failure), the same will now be possible at the application level (application monitoring, with automatic restart of services and/or VM in case of application failure). And because this will be built right into the platform, it’s going to be transparent and easy to use… or so I thought, based on years watching VMware dishing out incredible functionality that was always integrated, intuitive and “just worked”.
I assumed VMware will simply “enlighten” VMware Tools with the ability to detect known applications and monitor key metrics, and also make this framework extensible for custom applications (similar to pre-freeze / post-thaw scripts for application-specific snapshot logic). In case of application failure detected, VMware Tools would throw events into vCenter and first attempt “local” recovery by restarting services, and if that does not help, message vCenter to restart the VM. This architecture would make AppHA work out of box for every VM (including newly added), with zero hassle for admins: huge value that EVERY user would immediately benefit from.
Well, it appears that I assumed too much. In reality, the feature comes with incredible complexity, and is based on legacy architecture I would not expect leading virtualization vendor to release in 2013. First, this feature is not something built into the platform, but rather completely “glued” on top of it. Before you can even start using this feature, you will need to deploy two separate appliances… yes, one was not enough! The first appliance is Hyperic appliance (recent VMware acquisition), which is Microsoft SCOM like tool with ugly web interface (carrying maybe 10% of SCOM functionality), and sporting identical architecture (thus bringing 100% of SCOM complexity along). Second appliance is actual VMware AppHA appliance, which seems to orchestrate “stuff” between Hyperic server and vCenter Server.
And the “best” part? AppHA requires that you deploy special monitoring agents in every VM, so welcome back to the agent management fun we’ve made great strides to avoid (having to remember to install, upgrade, and babysit yet another agent in your VMs). And even worse, you will also need to ensure that every VM is accessible to Hyperic server over the network! Direct network connectivity to a VM from core infrastructure servers? What’s up with that, I thought cloud was all about complete isolation? In other words, just think about all the things you like about agent-free Veeam solutions, remember how you struggled with agent-based solutions before, and apply all that to vSphere AppHA. I totally expected they would simply reuse VMware Tools, because it is the necessary evil we have to live with… but unfortunately, this is not the case.
This is probably the first time ever that VMware delivers the feature that sounds good on paper, but has horrible implementation in reality. It feels very much like a “buy and glue on top” approach, rather than “innovate and build” acquisition. Are we seeing the change of VMware approach to R&D? I honestly hope this was more of an exception, rather than a rule, but this is still worrying and very annoying for me, hardened VMware fan. I will definitely be looking for VMware folks behind AppHA at VMworld Europe next month to discuss this, and understand what’s going on with this feature.