Solved

Aux Copy performance

Forum|Forum|4 years ago
April 29, 2021
11 replies
1647 views

+14

thomas.S
Novice

Hello,

we would like to tier out the data, wich is stored on the disk library to an Huawei Object Storage. I created a secoundary copy and configured an aux copy schedule. The problem is that the disk library disc space is running low because the job is not as fast as I was hoping.
The amount of data for the copy job can be up to 10 TB.
Is there a solution to speed up the aux copy job ? The Media Agents provide 2x10 Gbit cards.

Regards

Thomas

Best answer by Mike Struening

Thanks, @thomas.S !

I checked a few of the stream counters and it looks like the network is the cause.

If you check the column for ‘Time(seconds), that is the time the stream/pipe had to wait for data. In some cases, we’re waiting a minute or two.

The one below has some high wait times, though there are several pipes per MA.

3996 6720 05/05 15:03:02 2996475 |*5852487*|*Perf*|2996475| ======================================================================================= |*5852487*|*Perf*|2996475| Job-ID: 2996475 [Pipe-ID: 5852487] [App-Type: 0] [Data-Type: 1] |*5852487*|*Perf*|2996475| Stream Source: cvmapapp01 |*5852487*|*Perf*|2996475| Network medium: SDT |*5852487*|*Perf*|2996475| Head duration (Local): [05,May,21 15:01:01 ~ 05,May,21 15:03:02] 00:02:01 (121) |*5852487*|*Perf*|2996475| Tail duration (Local): [05,May,21 15:01:01 ~ 05,May,21 15:03:02] 00:02:01 (121) |*5852487*|*Perf*|2996475| ----------------------------------------------------------------------------------------------------- |*5852487*|*Perf*|2996475| Perf-Counter Time(seconds) Size |*5852487*|*Perf*|2996475| ----------------------------------------------------------------------------------------------------- |*5852487*|*Perf*|2996475| |*5852487*|*Perf*|2996475| Replicator DashCopy |*5852487*|*Perf*|2996475| |_Buffer allocation............................ 81 [Samples - 21079] [Avg - 0.003843] |*5852487*|*Perf*|2996475| |_Media Open................................... 6 [Samples - 15] [Avg - 0.400000] |*5852487*|*Perf*|2996475| |_Chunk Recv................................... 5 [Samples - 3] [Avg - 1.666667] |*5852487*|*Perf*|2996475| |_Reader....................................... 7 1110032163 [1.03 GB] [531.67 GBPH] |*5852487*|*Perf*|2996475| |*5852487*|*Perf*|2996475| Reader Pipeline Modules[Client] |*5852487*|*Perf*|2996475| |_CVA Wait to received data from reader........ 119 |*5852487*|*Perf*|2996475| |_CVA Buffer allocation........................ - |*5852487*|*Perf*|2996475| |_SDT: Receive Data............................ 7 1111164840 [1.03 GB] [Samples - 21113] [Avg - 0.000332] [532.21 GBPH] |*5852487*|*Perf*|2996475| |_SDT-Head: CRC32 update....................... 1 1111107304 [1.03 GB] [Samples - 21112] [Avg - 0.000000] |*5852487*|*Perf*|2996475| |_SDT-Head: Network transfer................... 93 1111107304 [1.03 GB] [Samples - 21112] [Avg - 0.004405] [40.06 GBPH] |*5852487*|*Perf*|2996475| |*5852487*|*Perf*|2996475| Writer Pipeline Modules[MediaAgent] |*5852487*|*Perf*|2996475| |_SDT-Tail: Wait to receive data from source.... 120 1111164840 [1.03 GB] [Samples - 21113] [Avg - 0.005684] [31.05 GBPH] |*5852487*|*Perf*|2996475| |_SDT-Tail: Writer Tasks....................... 28 1111107304 [1.03 GB] [Samples - 21112] [Avg - 0.001326] [133.05 GBPH] |*5852487*|*Perf*|2996475| |_DSBackup: Media Write...................... 8 1110192223 [1.03 GB] [465.28 GBPH] |*5852487*|*Perf*|2996475| |*5852487*|*Perf*|2996475| ----------------------------------------------------------------------------------------------------

+22

Mike Struening
Vaulter
Forum|Forum|4 years ago
April 29, 2021

@thomas.S , is the actual throughput the issue or is the amount of initial data the problem?

Starting with the latter, what is the intended retention on the Aux Copy, and how far back do the To Be Copied jobs go? the reason I ask is that it’s entirely possible that the aux Copy is grabbing data it will want to age off once the whole thing completes.

If it’s a performance issue, then we’d need to see some log files and stats to see if the issue is the read speed, the network/transfer or the write speed. Noting the 2x 10 Gbit cards, are you certain the job is using this interface?

https://www.linkedin.com/in/michael-struening

+14

thomas.S
Author
Novice
Forum|Forum|4 years ago
May 3, 2021

Hello @Mike Struening,

The problem is currently the throughput from my point of view.
The job currently runs every 3 hours and mainly copies the logs of the databases to the object storage during the day. Overnight, the data from the VSA backup is added. That adds up to a few TB.
Tomorrow I can provide your log, which shows the performance data.
I am sure that it uses LAN because the object storage is only accessible via LAN and nothing in this direction is zoned via FC to the media agents.

Regards

Thomas

+22

Mike Struening
Vaulter
Forum|Forum|4 years ago
May 3, 2021

Sounds good. I’ll add in some people to advise where we can find the performance counters as well.

https://www.linkedin.com/in/michael-struening

+22

Mike Struening
Vaulter
Forum|Forum|4 years ago
May 3, 2021

@thomas.S , check CVperfmgr.log on the destination MA for performance metrics. This will advise where to focus.

https://www.linkedin.com/in/michael-struening

+14

thomas.S
Author
Novice
Forum|Forum|4 years ago
May 5, 2021

I have collected the logs for the Aux Copy job. I only left the information in that related to the job ID.
Since these jobs are not so big I hope that you can already read out something here. I had to deactivate the big jobs first, because otherwise I get problems with the space on the disk library.

Thomas

1 Attachments

CVPerfMgr.zip

+22

Mike Struening
Vaulter
Answer
Forum|Forum|4 years ago
May 5, 2021

Thanks, @thomas.S !

I checked a few of the stream counters and it looks like the network is the cause.

If you check the column for ‘Time(seconds), that is the time the stream/pipe had to wait for data. In some cases, we’re waiting a minute or two.

The one below has some high wait times, though there are several pipes per MA.

https://www.linkedin.com/in/michael-struening

+14

thomas.S
Author
Novice
Forum|Forum|4 years ago
May 6, 2021

Hello @Mike Struening,

Thank you for the analysis. In this case, are there any points that I could check with the media agents before opening a case with our networkers?
I am thinking of settings that could be checked on the media agents ?

+22

Mike Struening
Vaulter
Forum|Forum|4 years ago
May 6, 2021

Unless you have any throttling in place, not likely. My initial concern was if you were somehow sending over the main network though you addressed that earlier.

Let me know what they find!!

https://www.linkedin.com/in/michael-struening

+24

Damian Andre
Vaulter
Forum|Forum|4 years ago
May 6, 2021

I don't think there is anything commvault configuration-wise that would cause such slow network performance. You could try toggle the auxcopy mode between network and disk optimized modes and see if it makes any difference. You only have to suspect the copy, change the setting and resume to test.

The rest will come down to benchmarking the system to help isolate the bottleneck - performance is always tricky, as it could be the local OS, network cards, switches/routers in the way, the destination device… you get the picture. So you have to perform some tests to help narrow down the problem.

You could try fiddle with TCP offload options, chimney, check the teaming mode, ensure drivers are up to date - try disable one network card and see if that helps. It sounds like receiving data is fine, so it could be this particular network segment or something is odd with the network teaming - depending on your load balancing mode, most non-switch assisted modes can only load balance transmits (round-robin between adapters) - you could try disabling teaming or one of those NIC’s to see if that is contributing to the performance slowness.

To try to isolate routing/network issues, you could try configure a network share (SMB) somewhere and copy some data there as a performance test, either through windows or a test copy. We also have the cloud test tool which can upload data to your hitachi object storage, and you could measure performance from these Media Agents vs other systems or network segments to help:

https://documentation.commvault.com/commvault/v11/article?p=9234.htm

+22

Mike Struening
Vaulter
Forum|Forum|4 years ago
May 7, 2021

@thomas.S , thought you’d find this interesting:

https://www.linkedin.com/in/michael-struening

Arvind Bingi
Novice
Forum|Forum|2 years ago
July 12, 2023

can someone please let me know how AUX copy job copies backup jobs ?

I could see our AUX copy is running since more than 10 days. But still seeing very old backup jobs in partially copied list.

How AUX copy picks up backup jobs for copying ?

Ideally older jobs should first copied.

1 Attachments

Sign up

Login to the community

Scanning file for viruses.

This file cannot be downloaded