File Level restore taking longer time from VM backup

Question

Hi team,We recently encountered an issue while restoring files from a VM backup to the original server's C: drive during a major application outage.The affected server is an application server that lost critical files required for the application to function. To recover the application, we initiated a file-level restore from the VM backup. However, the restore to the original server took significantly longer than expected (approximately 9 hours), which impacted the overall recovery time.As a workaround, we restored the required files to the Backup Media Agent and then manually transferred them to the affected VM. This approach allowed us to recover the application sooner than waiting for the direct restore to complete.To help us improve our recovery process, could you please recommend the best approach to mitigate similar situations in the future? Specifically, we would like guidance on the following:Best practices for performing file-level restores from VM backups during critical incidents.	Whether there are any configuration changes or optimizations that can improve restore performance to the original VM.	Whether deploying a File System agent for critical application servers, in addition to VM backups, would provide faster recovery for file-level restores.	Any other recommendations to reduce Recovery Time Objective (RTO) during major outages.We would appreciate your guidance to help us improve our disaster recovery process and minimize recovery time during future incidents.

DigitalDump · Answer

I had the same experience, here is what I learned.

Little massaging through a few AI tools and I had Arlie clean it up.

Why File System Agent Restores Are Faster Than VM Guest File Restores

1. File System Agent (FS Agent) Restores:

Multi-Stream Support:
When restoring files using the Commvault File System Agent, the restore process can use multiple parallel data streams. This allows Commvault to transfer data concurrently, maximizing throughput—especially if the destination server and storage can handle the load.
Direct Data Path:
Data is read directly from backup storage and sent to the client using several independent pipelines. You can control the number of streams in the restore options, which helps optimize performance for large restores.

2. VM Guest File Restores (GFLR):

Single-Stream Limitation:
When restoring files from a VM backup (agentless restore), Commvault uses a "Live Browse" or pseudo-mount mechanism. The MediaAgent mounts the VM disk image (VMDK/VHDX), parses the file system, and extracts files for restore.
Performance Bottleneck:
This process typically uses only a single data stream, regardless of the stream count set in the restore options. The virtual disk mount layer cannot efficiently support multiple concurrent readers, so large restores are much slower compared to FS Agent restores.

3. Why Your Restore Was Slow:

When you restored files directly from the VM backup to the original server, Commvault had to mount the VM disk, parse the file system, and transfer files over a single stream. This is inherently slower, especially for large file sets or many files.

4. Workaround and Best Practices:

Staging Restores:
For large or urgent restores, restoring files to a MediaAgent or staging server and then copying them to the target VM (using tools like Robocopy with multi-threading) is often faster.
Deploy FS Agent on Critical Servers:
For critical application servers with strict Recovery Time Objectives (RTOs) or large file sets, install the Commvault File System Agent. This enables multi-stream, high-performance restores and is the recommended approach for minimizing downtime.
Optimize Infrastructure:
Ensure the Media Agent handling the restore has fast local storage for its index cache and job results. Use HotAdd or SAN transport modes for VM-level operations when possible, as NBD mode is much slower.

5. Additional Recommendations:

Test Restores Regularly:
Periodically test your restore process to identify bottlenecks before a real outage.
Consider Live Sync/Replication:
For mission-critical servers, consider Commvault Live Sync (VM replication) to maintain a ready-to-boot replica VM, reducing RTO to minutes.

Summary:
For large or critical restores, use the File System Agent on important servers to enable fast, multi-stream restores. For VM guest file restores, expect slower performance due to architectural limitations. Use staging restores and infrastructure optimization as needed to minimize recovery time.

Documentation Links

File Level restore taking longer time from VM backup

2 replies

Readiverse Academy Certifications

Readiverse Academy Certifications

Sign up

Login to the community