Solved

VMware vCenter Enhanced Linked Mode (ELM) Backups


Userlevel 1
Badge +5

We discovered (the hard way, after doing a restore) that backups taken of vCenters in ELM contain inconsistent data and will likely break ELM replication if they are used to conduct a restore. We worked with VMware technical support to fix our vCenters’ ELM. It was a painful process.

See https://kb.vmware.com/s/article/85662?lang=en_us which suggests some limitations of backups of vCenters in ELM. Not an exhausive list.

I am wondering if anyone has encountered these limitations and perhaps there is a workflow we could use to conduct an offline backup? This would have to be during a scheduled outage, but it would be helpful to have some baseline clean backups in Commvault as a safety measure.

 

I have opened a Commvault technical support case. I think we need Commvault to request support from VMware to develop an API or fix their broken ELM backup.

 

The other community thread on this topic has an “answer” which is wrong. I can’t update that thread, so I am starting a new thread. For reference: 

 

icon

Best answer by George 4 August 2022, 22:50

View original

13 replies

Userlevel 6
Badge +15

Good afternoon.  I looked into this feature in the past, but due to the fact it will require cascading API calls through all of the vCenter’s, they opted not to go forward with implementing. The reason being that the API call cascade would tax the vCenter’s and cause unwanted performance issues on the vCenter side.

Please update this thread if you learn differently from Customer Support.

Userlevel 6
Badge +12

Are you talking about backing up the vcenter VM itself?   

 

If we are talking about vcenter itself, unless its frequently protected which i would think would add unnecessary strain, I’m not sure the failure scenario it would really help in.  

Userlevel 1
Badge +5

Why did we need a backup? In our case we were backing up the VCSA (if that is what you mean by the “vCenter VM itself”?) and restoring it to a different host (new hardware) to recover from hardware/network failure.

Not sure what unwanted performance issues Commvault might have been concerned about, but having any uncorrupted backup, even 24 hours old, is better than rebuilding a vCenter from an ISO image. I’d take the performance hit at 3AM once a day for that security.

VMware’s dereliction of duty (a KB indicating that ELM makes backups essentially impossible) is quite inexcusable.

Userlevel 6
Badge +12

Why did we need a backup? In our case we were backing up the VCSA (if that is what you mean by the “vCenter VM itself”?) and restoring it to a different host (new hardware) to recover from hardware/network failure.

Not sure what unwanted performance issues Commvault might have been concerned about, but having any uncorrupted backup, even 24 hours old, is better than rebuilding a vCenter from an ISO image. I’d take the performance hit at 3AM once a day for that security.

VMware’s dereliction of duty (a KB indicating that ELM makes backups essentially impossible) is quite inexcusable.

 

Gotcha, thanks for the details, that makes sense from a catastrophic failure event without HA.  

 

I dont have a solution and automating/orchestrating the powerdown/up carries its own set of risks and complexities (from practice with powercli)…. ill give this some further thought and reply if i come up with anything.

 

Userlevel 1
Badge +5

Response from Commvault technical support (escalation to development):

This is a limitation of VMware and specific to vCenter servers. There are no Commvault plans to develop around this limitation.

Commvault support provided the VMware native protection documentation pages for vCenter servers:  

Of course, the above do not address the VMware limitations of backup when ELM is being used. Which is described by 

So frustrating!

I also have a case open with VMware, asking them to describe a work-around for us. Seems like the only way to handle this is a manual backup. Some PowerCLI running using ESXi host-only operations may at least make the manual operation repeatable and scalable. But I just don’t have the time at the moment to debug a script like that. 

Hoping the community has already addressed this.

Otherwise we could consider abandoning ELM and that means duplicative efforts to maintain configurations on multiple vCenters.

 

 

 

 

 

Userlevel 1
Badge +5

I have created a similar question in the VMware communities forums: communities.vmware.com/

Userlevel 5
Badge +16

After reading that KB this kind of sucks.

Userlevel 7
Badge +19

Yes, but it kind of makes sense so you could say that in case no change/maintenance operations are occurring on any of the connected vCenter you most likely will not run into issues. In addition what in case you combine both a snapshot and a file dump. 

So in case you really need to go back that you have the possibility to restore VM + restore file dump. If it's really so sensitive than I think VMware will have to come-up with a solution for it by introducing some form of quiescing. Do mind a lot of VMware customers are moving/transitioning to the VCF framework which entails the use of ELM between the vCenters of the management domain and the connected workload domain(s) so this might raise the priority. 

Userlevel 1
Badge +5

I’m providing our disaster recovery vCenter restore documentation here. Stripped of proprietary information and generalized.

Userlevel 7
Badge +19

@George thanks for sharing, much appreciated! What was the event that lead you to having to restore vCenter to a previous point in time? 

Userlevel 1
Badge +5

@George thanks for sharing, much appreciated! What was the event that lead you to having to restore vCenter to a previous point in time? 

Technical disaster, lost networking, leading to corrupted datastore on the host containing vCenter.

Userlevel 7
Badge +19

While doing maintenance activities? 

Userlevel 1
Badge +5

While doing maintenance activities? 

Network hardware failure during a (many weeks preparation and early nonProduction) migration to a new vSphere platform and upgrading the platform. The details are unimportant, suffice-to-say we needed to restore one vCenter from backup. Since we do not have a VCSA with HA (no 3rd site and quorum) this was the only option. 

Reply