Solved

VM proxy hot add mode - logs of the VSA backup process

  • 22 November 2021
  • 10 replies
  • 1494 views

Userlevel 4
Badge +13

Hi all,

 

we are still facing the issue with failing VSA HotAdd mode backup, that ends with the error "Virtual disks needs consolidation".

We know, that during the back up, the virtual disks are binded to VM proxy, a snapshot is done, but the virtual disks stay attached to the backup proxy.

My question is, where to find logs from the back up process from the VM proxy. I think, in these logs it could be documented, why the virtual disks are still attached to the backup proxy VM. I have checked the log of the job, but there was no such information. Is there any log, that describes what is being done during the backup job on the VM proxy?

 

Cheers!

icon

Best answer by Mike Struening RETIRED 25 January 2022, 22:04

View original

10 replies

Userlevel 6
Badge +14

Hi @drPhil ,

 

Is the VSA Proxy or vCenter being Backed-up or Snapshotted during the backup? - This can sometimes cause issues.
I’d always recommend to protect VSA Proxy and vCenter at times differing to the usual VM backup to avoid issues.

Also ensure that Automount (Windows) or lvm2-lvemad (Linux) is disabled on the VSA proxies. 

 

The vsbkp.log file (Virtual Server Backup) in the Log Files directory (on VSA Proxies) will cover the VM protection Job.

 

Let us know how you get on.

 

Best Regards,

Michael

Userlevel 4
Badge +13

Hi @MichaelCapon 

thanks for your contribution!

 

Now, we have disabled backup of the proxies itself. Automount feature hasnt been checked yet.

 

In the log files (vsbkp.log file) I have found following errors:

 

CheckVMInfoError() - VM [vm_name] Error removing virtual machine snapshot. Please check Virtual Machine snapshot tree for any stale or missing snapshot chain.

VSBkpWorker::UnmountVM() - Leaving Cleanup string [VSIDA -j 543656 -a 2:110 -vmHost "vcenter_name" -vmName "vm_name" -vmGuid "guid_name" -vmCleanup "snapshot-xxxx" ] with JobManager as the unmount did not clear the cleanup string

RemoveDiskFromVM() - Failed to delete virtual disk [[datastore_name] xxx/vm_name-00000x.vmdk] from VM [proxy_name] with error - [Invalid configuration for device '0'.]

 

Are these error messages somehow useful for further investigation?

Userlevel 5
Badge +8

Hello drPhil,

Make VSA VM proxy out of regular subclient content where its protect itself, disable auto mount, remove old backup VMs disks from VSA proxy VM then do disk consolidation on backup VMs and disable DRS on VSA proxy VMs.

Have multi VM proxies depending on workload to distribute VMs accordingly. have multiple scsi controllers on each VM proxy as required to protect multiple disks of VMs. Use number of streams as required on VM group(subclients) and Stagger jobs to avoid hotadd lock situations. vsbkp and individual VM status on job will details of its failures.

 

Below are the BOL docs related to this ask, please check these to do cleanup and best practice implementation.

Hotadd mode:

https://documentation.commvault.com/11.25/expert/32048_hotadd_transport_for_vmware.html

Hotadd data flow:

https://documentation.commvault.com/11.25/expert/135014_data_flow_for_hotadd_transport_mode.html

Best practices: Disable DRS for VSA Proxies on VMs

https://documentation.commvault.com/11.25/expert/32572_best_practices_for_virtual_server_agent_with_vmware.html

snapshot cleanup process:

https://documentation.commvault.com/11.24/expert/113810_snapshot_cleanup.html

disk consolidation KB:

https://kb.commvault.com/article/62743

 

Regards

Gopinath

Userlevel 6
Badge +14

Hi @drPhil ,

Thanks for the update.
The combination of “failed to remove snapshot” and “failed to delete vdisk from proxy” sounds like the VSA had a Snapshot present on it when we tried to remove the disk.

I’d suggest the following steps:
1. Remove Snapshots of VSA Proxies
2. Remove any Guest Disks attached to the VSA Proxy from other VM’s.
3. Ensure VM’s are consolidated properly
4. Move the VSA Proxies into a separate Subclient (and schedule for a different time to VM backups)
5. Ensure Automount is disabled.

Note: If using VMware 6.5 or higher, Paravirtual SCSI Adapters should be used on VSA Proxies.

Once the above has been validated, let us know the success of your VM Backups.

 

Best Regards,

Michael

Userlevel 4
Badge +13

Hi @Gopinath and @MichaelCapon ,

thank you for your guidance!

Checked: - no VSA Proxy Snapshot, no Guest Disks attached to the VSA Proxy now, VM consolidaton done, VSA Proxy not being backuped itself, Automount disabled, Paravirtual SCSI Adapters in use, no DRS on VSA Proxy set up.

I do not know what stagger job does. There is one subclient, which has cca 50 VMs included. There are 4 VSA Proxies and once one VM is backed up another one is backed up after it. This looks good.

 

However, if we do not find the cause of the failing backup, it can reoccurs. That is something I am afraid of.

 

 

Userlevel 7
Badge +23

@drPhil , at this point, it might be best to open a support case and get an answer.

Once you do, please share the case number here so I can track it.

Thanks!!

Userlevel 7
Badge +23

Thanks again for the case number!

Userlevel 6
Badge +15

Hi !

Not sure it’s the case for you, but I have this warning message from VSA backups of VMs that have too many VMDKs.

I noticed this message in the logs of all the backups of VMs that had more than 20 disks configured (VMDKs). 

I guess the incr with CBT ON of some disks would not be able to calculate the delta, throwing this message. But it’s a guess and only engaging myself :wink:

Userlevel 4
Badge +13

Hi @MichaelCapon, @Gopinath, @Mike Struening and @Laurent. Thank you so much for your inputs.

To sum up the thread, I write down some points, that could help with this kind of the issue. However, the root cause has not been found out.

 

What we have changed

  • Commvault settings ( nRemoveSnapshotRetryAttempts  and nRemoveSnapshotRetryIntervalSecs  - see related thread "Virtual disks needs consolidation - failing VM backup")
  • the rest of the old backup software removed
  • VSA proxies removed from the backup set in order not to be backup at the same time as the other VM

 

What we have checked

  • Remove Snapshots of VSA Proxies
    Remove any Guest Disks attached to the VSA Proxy from other VM’s.
    Ensure VM’s are consolidated properly
    Move the VSA Proxies into a separate Subclient (and schedule for a different time to VM backups)
    Ensure Automount is disabled.

  • Checked logs (vsbkp.log, VIXDISKLIB.log file)

 

Maybe, for the future it would be interesting to open the ticket with VMware support as well.

 

Have a great one!

 

 

 

Userlevel 7
Badge +23

Sharing case resolution:

From the CV logs we do not detect any errors with the snapshot removal. Please ensure the AV exclusions are in-place as this can cause issues with snapshot removals and hotadd backups.

https://documentation.commvault.com/11.24/expert/8665_recommended_antivirus_exclusions_for_windows.html



5960 1f8c 12/05 22:52:11 969364 CVMWareInfo::_RemoveVMSnapshot() - Removing Snapshot [snapshot-72377] of VM [<vm name>] Guid [5039922a-c165-5158-9a26-a1086632f4ac]

5960 1f8c 12/05 22:52:22 969364 CVMWareInfo::_RemoveVMSnapshot() - Successfully removed Snapshot of VM [<vm2 name>] Guid [5039922a-c165-5158-9a26-a1086632f4ac] duration [00:00:11] size [0]


The antivirus on the VSA proxy will cause issues with hotadd, not the Guest VMs. The AV logs rarely show a block event or that it scanned the CV process. Typically when it scans the CV process is will allow it but since it opened a handle to our process is can cause the process to hang resulting in disks that did not clean up properly. This can result in orphan disks since the process was hung during the mount or unmount operation. Based off your latest logs I do not see the old snapshots still attached to the VSA proxy and backup\consolidations are all working. At this point, I recommend monitoring the backups till Friday to ensure everything continues to work.


The keys will increase the amount of snapshot removal commands and the frequency we send them. If there is already an issue with snapshot cleanup due to hotadd or orphan snapshots and it is not cleaned up manually before the backups these keys will not help.

Reply