Question

Slow restore throughput

  • 31 August 2023
  • 3 replies
  • 789 views

Badge +2

Hi team

I am creating and testing some DR procedures including vmware VM restore . I have done some restore performance tests. My commvault is. 11.28. I use hotadd mode (the most efficient transport in my environment because of thin disks). Media agent and vm proxy servers are windows 2019. And restore speed of large single VM seems to be below expectations.

It is about 990 GB/hr (275 MB/s, 2,2 gbps).

What is a bootleneck ? CPU or MEM is not a limit I believe but I noticed that the data transfer speed between media agent and proxy servers during restoring does not exceed 2.2 gbps (240 MB/s) over a 10 gbps network. It seems to me not enough.

I checked outside commvault what can I expect from my hardware. The robocopy network transfer tests of a single relatively large 50 GB file (vmdk) from media agent to proxy and shows a transfer on LAN net of no less than 7 gbps (870 MB/s). ?

3 times better than in commvault process ? Is it expected behaviour ? Why ? Can I tune this ? 

I have tried with nnumpipeline parameters, optimize for concurent LAN backups, streams or network agents options but without effect. Dedup or compression also have minor influence.

Any advive would be appreciated.


3 replies

Userlevel 7
Badge +23

Have you tried NBD directly from the Media Agent rather than going via Hotadd proxy on the host?

Does performance scale with number of VMs? in a DR scenario you’re likely will be restoring multiple VMs at once which could saturate the throughput rather than individual VM restore.

There is a tool you can use to test performance from the proxy to the datastore - 4:48 in the video below:

 

Badge +2

Hi team

Thanks @Damian Andre for very useful testvminfo…


By the way DR - yes of course - the performance scales very well as the number of vms played simultaneously increases. But in DR - usually a few machines are quickly recreated at the beginning and then we wait for the last one, the biggest one and usually the most important. In the end restoring works with 1 vm - 1 stream.
It is this last step that I would like to shorten.

My scalability tests:
1 vm restores - 275 MB/s
2 vms restores at the same time - 680 MB/s.

Testvminfo tests (-restorevm and -writedisk options for creating vm) :
NBD option - 184 MB/s

Hotadd option - 580 MB/s

This is similar to real commvault recovery tests - where the selected NBD transport is much slower than Hotadd. It is not surproise. As I remember, this is probably due to the limitations of the managment ESX interface (used in NBD). Hotadd uses VM Network interfaces of ESX server.

I also did a test of restoring the large vmdk file to the local mediaagent server folder. The test was to show whether the recover from the commvault library is not a limit.
The test showed about 450 MB/s.

From these tests, for me, bootleneck is a LAN transfer for 1 data stream - 2.2 gbps. 

And reaeat - the windows robocopy.exe tests of lan transfer a file from media agent to proxy shows no less than 7 gbps (870 MB/s).


Can it be improved?

Badge +2

Hi team

Thanks @Damian Andre for very useful testvminfo…


By the way DR - yes of course - the performance scales very well as the number of vms played simultaneously increases. But in DR - usually a few machines are quickly recreated at the beginning and then we wait for the last one, the biggest one and usually the most important. In the end restoring works with 1 vm - 1 stream.
It is this last step that I would like to shorten.

My scalability tests:
1 vm restores - 275 MB/s
2 vms restores at the same time - 680 MB/s.

Testvminfo tests (-restorevm and -writedisk options for creating vm) :
NBD option - 184 MB/s

Hotadd option - 580 MB/s

This is similar to real commvault recovery tests - where the selected NBD transport is much slower than Hotadd. It is not surproise. As I remember, this is probably due to the limitations of the managment ESX interface (used in NBD). Hotadd uses VM Network interfaces of ESX server.

I also did a test of restoring the large vmdk file to the local mediaagent server folder. The test was to show whether the recover from the commvault library is not a limit.
The test showed about 450 MB/s.

From these tests, for me, bootleneck is a LAN transfer for 1 data stream - 2.2 gbps. 

And reaeat - the windows robocopy.exe tests of lan transfer a file from media agent to proxy shows no less than 7 gbps (870 MB/s).


Can it be improved?

 

From what source/libary do you run the restore? Because it’s known issue that single stream is slow, and you can’t increase the performance. We have the same issue, no help from support, just the information that it’s by desing ..

Reply