Solved

What does “Unable/failed to quiesce the guest file system during snapshot creation” mean, and how do I resolve it on a VMware backup?

  • 30 December 2020
  • 6 replies
  • 13014 views

Userlevel 2
Badge +5

Looking for details, seeing errors during VMware backups specifically during snapshot creation. How can I address this?

icon

Best answer by jgeorges 30 December 2020, 23:57

View original

6 replies

Userlevel 5
Badge +9

Hey @Vsicherman 

Getting Application Consistent snapshots is very important for the recoverability of files and application data on a VM. During a backup, Commvault instructs VMware to take a VSS-enabled snapshot, which will in-turn tell the file system and any applications to write any in-memory change to disk with a temporary I/O pause. Once the VSS snapshot is in place, VMware takes the virtual machine snapshot which ensures that all the data is captured in the most consistent form. 

Virtual Server backups may "Complete with Errors" when one or more virtual machine guest report quiescing error during a backup. Even though the VSS quiesced snapshot is failing, the data is still being protected as a Crash Consistent backup instead. A restore from this VM has the slight possibility of being inconsistent for in-use files. An an example, recovering this system would not be dissimilar if you were starting the VM after an ungraceful power outage OR a hardware failure. The data associated with this VM, if restored, for exmaple, a SQL database using its files, could result in a dirty SQL MDB that SQL would have to repair, possibility losing some transactions that were in-flight during the snapshot. 

You can view the VMs that failed to quiesce from the Backup History for the subclient -> right click on the Job that Completed with Errors -> View Job details -> Virtual Machine Status. Sort the list of VMs by Status, you should see Partial Success and Failure reason "Unable to quiesce"

When Commvault creates a snapshot, it is done through the VMware API, as such, testing the ability to quiesce the virtual machine can be done by taking a snapshot through VMware and checking the quiesce box (although don’t snapshot the VM memory). If you can take a quiesced snapshot through VMware, Commvault should be able to do the same. If a quiesced snapshot can't be taken on the machine(s) displaying the warning, you should see this reflected in the VMware console which is why Commvault displays the warning. If you are unable to take a quiesced snapshot, here are some things you can try: 

  1. Verify that VMware Tools are up to date on the problematic VMs. If they are not up to date, update them, and run a new backup and see if this resolves the issue. If it still fails, move on to step 4. 

  1. Review the Windows Application System and Event logs on the Guest for Volume Shadow Copy service errors, or something similar. 

  • Additionally, you can identify the specific failed VSS writer by issuing the command vssadmin list writersfrom an elevated command prompt. This command will list all of the VSS Writers on the system and their current status, ALL listed writers should show a Stable and No errorlisted under State and Last Error. If you see errors or a timeout status, a reboot should resolve the issue, although it may only be temporary if there is another underlying cause. 

Additionally, the load or I/O on the ESX Host and Datastore play a crucial role in the successful snapshot process, multiple concurrent snapshots can time out and generate these failures. You can try to consolidate the number of running jobs and control parallel snapshots using readers. 

If you are unable to resolve the issue with these steps, here are some other measures to help troubleshoot: 

  • Can a quiesced snapshot be generated in VMware outside of Commvault? 

  • Are there errors being generated in the 'events and tasks' of the VMware console during the backups? 

  • Are the machines up to date with VMware tools? Are there VSS errors occurring in the machines' event viewer logs? 

  • Is this a particularly large VM? 

  • Can you test this machine in a separate sub client to see if the issue occurs when it runs by itself? 

  • Do these machines have databases (SQL, Exchange, etc) that need to be quiesced, as a standard file system can simply use crash consistent?

-Cheers

Userlevel 2
Badge +8

I am a bit confused with what Microsoft website says about consistent backups; I thought application and File system consistent are same as both use VSS; and non-VSS backups are crash consistent

 

 

 

  • Application consistent
    • The snapshot captures the virtual machine as a whole. It uses VSS writers to capture the content of the machine memory and any pending I/O operations.
    • For Linux machines, you'll need to write custom pre or post scripts per app to capture the application state.
    • You can get complete consistency for the virtual machine and all running applications.
  • File system consistent
    • If VSS fails on Windows, or the pre and post scripts fail on Linux, Azure Backup will still create a file-system-consistent snapshot.
    • During a recovery, no corruption occurs within the machine. But installed applications need to do their own cleanup during startup to become consistent.
  • Crash consistent
    • This level of consistency typically occurs if the virtual machine is shut down at the time of the backup.
    • No I/O operations or memory contents are captured during this type of backup. This method doesn't guarantee data consistency for the OS or app.
Userlevel 6
Badge +15

Application VS Filesystem consistent : consider a windows OS hosting a SQL Server.

With filesystem consistent, your VM and OS will be properly backup, but the SQL would mostly have to be fixed/recovered by some DBA, as the SQL instance(s) would surely be in Suspect mode after VM restore of this kind. But if you performed the SQL dumps to disks before the VM backup, those dumps would be OK to recover from. You need DBA to recover = longer manipulations and time to recover.

While if you set Application consistent, you’ll need to provide credentials to get inside the VM and ask to SQL server to quiesce its I/Os, to be sure that the upcoming snapshot would be consistent from a SQL point of view. No need of a DBA to recover = less manipulations and time to recover.

There are pros and cons. I honestly prefer to not use Application consistent in most of my environments, we don’t have such requirement, but also as the whole process relies on multiple components that are mostly out of my scope, like vmtools up to date, SQL credentials and proper rights, just to mention a few..

Userlevel 4
Badge +9

Question: would the Unable/failed to quiesce the guest file system during snapshot creation” error result in 1 file failing to backup ?

Userlevel 2
Badge +4

In my experience quiescing depends on the vmtools being at the latest level, the VM itself not being too busy, the esx host it’s on having plenty of resources.  Often this just isn’t the case and the backup fails.

 

I tend to use crash consistent, as I filter out data/sql drives and use agent-based for those anyway.

Badge

Disable “VMware Snapshot Provider” on windows server, so it can use the Windows native "Volume Shadow Copy" service instead. “VMware Snapshot Provider” is buggy and it is slow to perform and frequently crash.

Reply