we are still facing the issue with failing VSA HotAdd mode backup, that ends with the error "Virtual disks needs consolidation".
We know, that during the back up, the virtual disks are binded to VM proxy, a snapshot is done, but the virtual disks stay attached to the backup proxy.
My question is, where to find logs from the back up process from the VM proxy. I think, in these logs it could be documented, why the virtual disks are still attached to the backup proxy VM. I have checked the log of the job, but there was no such information. Is there any log, that describes what is being done during the backup job on the VM proxy?
Cheers!
Best answer by Mike Struening RETIRED
Sharing case resolution:
From the CV logs we do not detect any errors with the snapshot removal. Please ensure the AV exclusions are in-place as this can cause issues with snapshot removals and hotadd backups.
5960 1f8c 12/05 22:52:11 969364 CVMWareInfo::_RemoveVMSnapshot() - Removing Snapshot [snapshot-72377] of VM [<vm name>] Guid [5039922a-c165-5158-9a26-a1086632f4ac]
5960 1f8c 12/05 22:52:22 969364 CVMWareInfo::_RemoveVMSnapshot() - Successfully removed Snapshot of VM [<vm2 name>] Guid [5039922a-c165-5158-9a26-a1086632f4ac] duration [00:00:11] size [0]
The antivirus on the VSA proxy will cause issues with hotadd, not the Guest VMs. The AV logs rarely show a block event or that it scanned the CV process. Typically when it scans the CV process is will allow it but since it opened a handle to our process is can cause the process to hang resulting in disks that did not clean up properly. This can result in orphan disks since the process was hung during the mount or unmount operation. Based off your latest logs I do not see the old snapshots still attached to the VSA proxy and backup\consolidations are all working. At this point, I recommend monitoring the backups till Friday to ensure everything continues to work.
The keys will increase the amount of snapshot removal commands and the frequency we send them. If there is already an issue with snapshot cleanup due to hotadd or orphan snapshots and it is not cleaned up manually before the backups these keys will not help.
Is the VSA Proxy or vCenter being Backed-up or Snapshotted during the backup? - This can sometimes cause issues. I’d always recommend to protect VSA Proxy and vCenter at times differing to the usual VM backup to avoid issues.
Also ensure that Automount (Windows) or lvm2-lvemad (Linux) is disabled on the VSA proxies.
The vsbkp.log file (Virtual Server Backup) in the Log Files directory (on VSA Proxies) will cover the VM protection Job.
Now, we have disabled backup of the proxies itself. Automount feature hasnt been checked yet.
In the log files (vsbkp.log file) I have found following errors:
CheckVMInfoError() - VM [vm_name] Error removing virtual machine snapshot. Please check Virtual Machine snapshot tree for any stale or missing snapshot chain.
VSBkpWorker::UnmountVM() - Leaving Cleanup string [VSIDA -j 543656 -a 2:110 -vmHost "vcenter_name" -vmName "vm_name" -vmGuid "guid_name" -vmCleanup "snapshot-xxxx" ] with JobManager as the unmount did not clear the cleanup string
RemoveDiskFromVM() - Failed to delete virtual disk [[datastore_name] xxx/vm_name-00000x.vmdk] from VM [proxy_name] with error - [Invalid configuration for device '0'.]
Are these error messages somehow useful for further investigation?
Make VSA VM proxy out of regular subclient content where its protect itself, disable auto mount, remove old backup VMs disks from VSA proxy VM then do disk consolidation on backup VMs and disable DRS on VSA proxy VMs.
Have multi VM proxies depending on workload to distribute VMs accordingly. have multiple scsi controllers on each VM proxy as required to protect multiple disks of VMs. Use number of streams as required on VM group(subclients) and Stagger jobs to avoid hotadd lock situations. vsbkp and individual VM status on job will details of its failures.
Below are the BOL docs related to this ask, please check these to do cleanup and best practice implementation.
Thanks for the update. The combination of “failed to remove snapshot” and “failed to delete vdisk from proxy” sounds like the VSA had a Snapshot present on it when we tried to remove the disk.
I’d suggest the following steps: 1. Remove Snapshots of VSA Proxies 2. Remove any Guest Disks attached to the VSA Proxy from other VM’s. 3. Ensure VM’s are consolidated properly 4. Move the VSA Proxies into a separate Subclient (and schedule for a different time to VM backups) 5. Ensure Automount is disabled.
Note: If using VMware 6.5 or higher, Paravirtual SCSI Adapters should be used on VSA Proxies.
Once the above has been validated, let us know the success of your VM Backups.
Checked: - no VSA Proxy Snapshot, no Guest Disks attached to the VSA Proxy now, VM consolidaton done, VSA Proxy not being backuped itself, Automount disabled, Paravirtual SCSI Adapters in use, no DRS on VSA Proxy set up.
I do not know what stagger job does. There is one subclient, which has cca 50 VMs included. There are 4 VSA Proxies and once one VM is backed up another one is backed up after it. This looks good.
However, if we do not find the cause of the failing backup, it can reoccurs. That is something I am afraid of.
VSA proxies removed from the backup set in order not to be backup at the same time as the other VM
What we have checked
Remove Snapshots of VSA Proxies Remove any Guest Disks attached to the VSA Proxy from other VM’s. Ensure VM’s are consolidated properly Move the VSA Proxies into a separate Subclient (and schedule for a different time to VM backups) Ensure Automount is disabled.
Checked logs (vsbkp.log, VIXDISKLIB.log file)
Maybe, for the future it would be interesting to open the ticket with VMware support as well.
From the CV logs we do not detect any errors with the snapshot removal. Please ensure the AV exclusions are in-place as this can cause issues with snapshot removals and hotadd backups.
5960 1f8c 12/05 22:52:11 969364 CVMWareInfo::_RemoveVMSnapshot() - Removing Snapshot [snapshot-72377] of VM [<vm name>] Guid [5039922a-c165-5158-9a26-a1086632f4ac]
5960 1f8c 12/05 22:52:22 969364 CVMWareInfo::_RemoveVMSnapshot() - Successfully removed Snapshot of VM [<vm2 name>] Guid [5039922a-c165-5158-9a26-a1086632f4ac] duration [00:00:11] size [0]
The antivirus on the VSA proxy will cause issues with hotadd, not the Guest VMs. The AV logs rarely show a block event or that it scanned the CV process. Typically when it scans the CV process is will allow it but since it opened a handle to our process is can cause the process to hang resulting in disks that did not clean up properly. This can result in orphan disks since the process was hung during the mount or unmount operation. Based off your latest logs I do not see the old snapshots still attached to the VSA proxy and backup\consolidations are all working. At this point, I recommend monitoring the backups till Friday to ensure everything continues to work.
The keys will increase the amount of snapshot removal commands and the frequency we send them. If there is already an issue with snapshot cleanup due to hotadd or orphan snapshots and it is not cleaned up manually before the backups these keys will not help.
We use 3 different kinds of cookies. You can choose which cookies you want to accept. We need basic cookies to make this site work, therefore these are the minimum you can select. Learn more about our cookies.