Skip to main content

Hello community,

sometimes, while performing backups of 7-8 TB VMs, I find errors like this one in the logs, especially when such jobs take a very long time

Network timeout. DDB Engine did not respond within the timeout period. DDB: MI_GDP_LOCAL_VMs_6, DDB Id: 6. Source: MI-MEDIA, Process: vsbkp

this example comes from Jan 28 at 5:31:13 am

 

At the same time, in the Events tab of the Job Details I see the following event

Failed to remove the backup snapshot for virtual machine  MI-SERVER], the snapshot may need to be manually removed.  The operation has timed out.]

 

I think it is reasonable to assume they are related, correct?

Given such cases, should I extend the timeout for snapshot deletion?

 

Thank you!

Gaetano

It seems likely - especially if you are not seeing the same error on other backups.

7/8 TB VMs very large - as soon as the snapshot is in place, the VM freezes the base disks and starts accumulating changes to a child disk. When the backup is complete (probably many, many hours later), the snapshot removal tasks causes VMware to merge all the changes from the snapshot back into the base disk.

This is a very taxing I/O operation and for such a large VM or a VM with high change rate, can take a long time to process disk consolidation. Even though the snapshot removal times out, VMware should still be processing it in the background and the next VM backup job will check that it completed before it takes a new snap.

The backup should be completed with errors rather than failed, correct?


It was actually marked as Completed but it really took a very long time.

Checking the Attempts tab shows that there was a new attempt about 20 min. later

 

On the vSphere side, it appears that the snapshot removal was completed in about 35 min, at 05:35:58, so only 4m and 45s later


Reply