Solved

Issues with sporadic failed VM Backups via hotadd


Userlevel 3
Badge +7

Hi,

We’re running into a strange issue with hotadd VM backups.

For some VMs we see the error “Failed to download config file” and “Failed to open disk”. This is never for the same VMs over and over again, but sporadic and always for different VMs.

We have two SCSI controllers defined on the Proxy and each controller has a “dummy”/”actual” disk as the first disk. (As advised by VMWare: https://kb.vmware.com/s/article/2075069

Has anyone ever seen this issue? Any idea what might be causing it?

icon

Best answer by Damian Andre 18 July 2023, 14:34

View original

10 replies

Userlevel 6
Badge +14

Hi @Jeremy ,

Can you confirm that the port assignment is correct?

Can you check our Documentation:

https://documentation.commvault.com/2022e/expert/32048_hotadd_transport_for_vmware.html

Best Regards,

Sebastien Merluzzi

Userlevel 3
Badge +7

Hi Sebastien,

Thank you for your answer!

I had a look at the VMX file, but it looks fine.

...

scsi0.pciSlotNumber = "160"

scsi1.pciSlotNumber = "256"

Might we be running to a limit here? Do we need an additional SCSI controller?

Although I’m not sure that might explain the errors we’re seeing.

Userlevel 4
Badge +11

Hello @Jeremy 

A single SCSI controller can have up to 15 disks attached, with at least one SCSI device node reservation for the virtual machine OS. If you run concurrent backup jobs that include more than 15 disks, you might need to add SCSI controllers to the VSA proxy that is responsible for hot adding disks. 

If the current one’s are not meeting your environment, please attempt to increase it and monitor the environment. 

 

Best,

Rajiv Singal

Userlevel 6
Badge +14

@Jeremy ,

Yes you can add a SCSI and confirm.

Userlevel 7
Badge +19

@Jeremy I assume it worked without problems before, right? Did you make any changes to the environment. Which Commvault version are you running? Which delivery model are you using for the storage towards the hypervisors? 

Userlevel 3
Badge +7

Hi @Onno van den Berg 

Yes, the entire environment was refreshed, so new media agents, libraries and VSA proxies. Unfortunately, I’m not able to say if the issue appeared before the hardware refresh. Either way, we’ll have to find a way to solve this. We might engage Commvault Support. To me the configuration of the proxy looks fine, I implemented the best practices as stated on the documentation website. We’re running 11.28 by the way. We’re using NFS datastores, so we might try and implement NAS transport mode, instead of hotadd, but that something for the near future.

Userlevel 6
Badge +14

@Jeremy ,

You can follow the Testing HotAdd steps in the doc I sent you and if they work then sure, log a case with Support and we will check.

Userlevel 7
Badge +23

In general I’d not recommend HotAdd for NFS datastores if you can avoid it. Sometimes VMware struggles with locking and you can end up with unresponsive VMs.

Userlevel 3
Badge +7

Hi @Damian Andre 

Thanks for the info. The VMWare article does seem related, but it talks about unresponsiveness during snapshot removal (and creation) but we’re not having issues with these two actions. The issue occurs specifically during the backup copy process.

For the record, we have a support ticket open for this issue.

Userlevel 3
Badge +7

Hello,

I wanted to provide an update on this topic. Both Commvault Support and VMWare pointed to the article shared by @Damian Andre as a root cause for the issues. They both advised to not use a single VSA proxy for several hosts when using NFSv3 datastores. They proposed the following solutions:

  • Configure a VSA Proxy per host
  • Use NBD
  • Use NFSv4 te mount the datastores to the hosts

Reply