Hi team,
We recently encountered an issue while restoring files from a VM backup to the original server's C: drive during a major application outage.
The affected server is an application server that lost critical files required for the application to function. To recover the application, we initiated a file-level restore from the VM backup. However, the restore to the original server took significantly longer than expected (approximately 9 hours), which impacted the overall recovery time.
As a workaround, we restored the required files to the Backup Media Agent and then manually transferred them to the affected VM. This approach allowed us to recover the application sooner than waiting for the direct restore to complete.
To help us improve our recovery process, could you please recommend the best approach to mitigate similar situations in the future? Specifically, we would like guidance on the following:
- Best practices for performing file-level restores from VM backups during critical incidents.
- Whether there are any configuration changes or optimizations that can improve restore performance to the original VM.
- Whether deploying a File System agent for critical application servers, in addition to VM backups, would provide faster recovery for file-level restores.
- Any other recommendations to reduce Recovery Time Objective (RTO) during major outages.
We would appreciate your guidance to help us improve our disaster recovery process and minimize recovery time during future incidents.
