Hello,we are running DB2 backup AIX LPAR. We have noted that CPU load is high“In virtualized environments (e.g.,LPAR, WPAR, Solaris Zones etc.,) where dedicated CPUs are not allocated, backup jobs may result in high CPU usage on production servers. The following measures can be taken to optimize the CPU usage”Best Practices - AIX File System (commvault.com) The steps were covered and additional registry keys to limit CPU load and core usage were set in place. Switching deduplication to MediaAgent didn’t provided sufficient change, almost no CPU consumption change was in place. Only jobs without deduplication consumes less CPU, but we cannot use them. At the end we can limit CPU load to 50-60% from up to 90%, but it impacts the backup duration few times. Playing with streams is not an option too since additional CPU consumption occurs. Since we are talking about Log backup mostly, Intellisnap won’t help much:Can I perform an IntelliSnap backup for log files?No. During an IntelliSnap backup, log files are not moved to the snapshot copy even if you select the Backup Log Files (...)Frequently Asked Questions for DB2 IntelliSnap (commvault.com) Is there a way to dig further and change that CPU consumption, especially if it comes to DB2 backups other than IntelliSnap?

LPAR AIX, DB2 high CPU consumption

+2

Adding to @Lukas_S . For the same set of backups below are the additional settings used, and commvault registry settings are not providing the expected results on the Client with reduction in CPU. Please share feedback and if this is to do with any issues within product or any other alternate suggestion to perform.

Server Compute:

Entitled Capacity : 3.75
Online Virtual CPUs : 5
Online Memory : 131072 MB
Entitled Capacity of Pool : 1505

Commvault Parameters , CPU utilization and backup run time:

sSDTHeadMaxCPUUsage - 25%

Commvault Parameters: Streams-8, buffers-8, buffer size-4096,parallelism-8 : CPU utilization 90% avg, backup runtime: 1.5 hours

Commvault Parameters: Streams-5, buffers-5, buffer size-2048,parallelism-5 : CPU utilization 75% avg

backup runtime: 6 hours

Commvault Parameters , CPU utilization and backup run time:

sSDTHeadMaxCPUUsage - 25%

dnicevalue - 15

process_priority - 1

Commvault Parameters: Streams-8, buffers-8, buffer size-4096,parallelism-8 : CPU utilization 85% avg; Backup runtime: 2.20 minutes

Userlevel 7

+23

If I recall correctly, logs rarely deduplicate well but compress well with standard compression.

That being said, compression still comes at a CPU cost. If these clients were installed prior to FR20 then they could be using GZIP compression, which sometimes can use more CPU.

You can try change the compression using this setting (nCOMPRESSIONSCHEME) and set it to 1 to force LZO compression. Note that this will change the signature in the DDB for this client resulting in a re-baseline and higher storage consumption on the first backup after making the change.

I recommend trying it out on a test machine if you can first, or perhaps just a single client at first to see if it makes a difference.

In either case, limiting CPU performance will limit backup performance. We aren’t using those CPU cycles just for fun :)

In V11 SP4 (yes, half a decade ago). We switched to LZO compression for VSA for new VSA clients. Here was the CPU performance difference at the time (which was tested in a virtual environment using HotAdd):

CPU % - Left: GZIP (old), Right: LZO (new)

Algorithm	Streams	vCPU	Application Size (GB)	GB/hr	Avg CPU
GZIP (Old)	4	4	127.73	639	80.2%
LZO (New)	4	4	127.73	1630 (+155%)	62.8%

L

Userlevel 1

+5

Lukas_S
Author
Byte
17 replies
2 years ago
16 February 2022

Hello,

thank you both for your answers.

The Client is running on FR20, so it’s supposed to be running with LZO algorithm, but still it’ll be worth to check. Maybe compression, not deduplication is the clue. Because MediaAgent side dedup, still didn’t gave any expected results.

At the end, this seems to be the real issue, because performance peaks shown when backup software was changed:

“In virtualized environments (e.g.,LPAR, WPAR, Solaris Zones etc.,) where dedicated CPUs are not allocated, backup jobs may result in high CPU usage on production servers. The following measures can be taken to optimize the CPU usage”

Is there a way we can test other sets of commands being used during backup? Are devs working on new approach / engine regarding to LPAR? Whole virtualization is based on the using not dedicated cores, but we don’t see such CPU peaks during VMware or Hyper-V backups.

regards,

Łukasz.

Userlevel 7

+23

Hello,

thank you both for your answers.

The Client is running on FR20, so it’s supposed to be running with LZO algorithm, but still it’ll be worth to check. Maybe compression, not deduplication is the clue. Because MediaAgent side dedup, still didn’t gave any expected results.

Not so much if its been upgraded to FR20, it depends if it was installed prior to FR20. The reason we don't automatically switch algorithms for old clients is the impact on deduplication with a new baseline. Changing the compression algorithm with create new blocks.

Setting dedupe on media agent won’t engage if you have client-side compression on, as data is compressed before deduplication. So in that case you’d have to set both on Media Agent side to take effect.

I can’t help much on the LPAR side since I’m not a Linux guru, but I’ll see if I can flag somebody who knows that.

L

Userlevel 1

+5

Lukas_S
Author
Byte
17 replies
2 years ago
16 February 2022

Hello,

thank you both for your answers.

The Client is running on FR20, so it’s supposed to be running with LZO algorithm, but still it’ll be worth to check. Maybe compression, not deduplication is the clue. Because MediaAgent side dedup, still didn’t gave any expected results.

Not so much if its been upgraded to FR20, it depends if it was installed prior to FR20. The reason we don't automatically switch algorithms for old clients is the impact on deduplication with a new baseline. Changing the compression algorithm with create new blocks.

Setting dedupe on media agent won’t engage if you have client-side compression on, as data is compressed before deduplication. So in that case you’d have to set both on Media Agent side to take effect.

I can’t help much on the LPAR side since I’m not a Linux guru, but I’ll see if I can flag somebody who knows that.

Hello Damian,

actually agent was installed on FR20, so the LZO algorithm is in place.

Correct, so we’d try going without compression at all with MA side dedup in place.

Thank you.

L

Userlevel 1

+5

Lukas_S
Author
Byte
17 replies
2 years ago
19 February 2022

Tests with compression disabled did not give the expected results.

I’m wondering is there a more advanced way of tuning CPU consumption on LPAR?

“In virtualized environments (e.g.,LPAR, WPAR, Solaris Zones etc.,) where dedicated CPUs are not allocated, backup jobs may result in high CPU usage on production servers. The following measures can be taken to optimize the CPU usage”

regards,

Łukasz

Userlevel 7

+23

@Lukas_S , I’m going to see if we can get some of our support folks to chime in, though a support case might end up being best here.

L

Userlevel 1

+5

Lukas_S
Author
Byte
17 replies
2 years ago
24 February 2022

@Lukas_S , I’m going to see if we can get some of our support folks to chime in, though a support case might end up being best here.

Yes, you are right. At the end CPU consumption issue is more complex / low level.

Thank you all for your time.

Userlevel 7

+23

I wasn’t able to get anything helpful quickly. Create a support case and share the incident number here so I can track it accordingly :nerd:

Userlevel 7

+23

@Lukas_S , were you able to create a support case to track this one down?

If so, please share the case number with me.

L

Userlevel 1

+5

Lukas_S
Author
Byte
17 replies
2 years ago
23 March 2022

@Lukas_S , were you able to create a support case to track this one down?

If so, please share the case number with me.

Hello Mike, yes, but it will be long lasting case.

regards,

Łukasz.

Userlevel 7

+23

I’m not going anywhere 😂

I’ll keep an eye on the case you pm’d me.

Userlevel 7

+23

Sharing the case resolution:

Experiencing high CPU in comparison to TSM backups for DB2.

Provided detailed analysis on what's using CPU methods to reduce this.

sSDTHeadMaxCPUUsage although set to 25% not having the desired effect:

DB2SBT log:
8454448 1 04/05 06:46:01 ####### SDT max. CPU thread count is [10] based on reg. value [25%], Procsr count [40]
8454448 1 04/05 06:46:01 ####### SdtBase::InitWrkPool: Initializing SDT head thread pool
8454448 1 04/05 06:46:01 ####### Max head thread count set to 40. CPU # = 40
8454448 1 04/05 06:46:01 ####### Threads per connection set to 20
8454448 1 04/05 06:46:01 ####### Initial max. threads set to 40

Logs indicate that we see 40 CPU's however the machine actually only has 5 virtual CPU's (lpar). We are performing calculation based on 40 CPU's meaning 24% of 40 = 10 and so 10 CPU threads used rather than expected 25% of 5 (1 or 2 rounded up).

This should be amended to specify threads rather than % of CPU - during session for testing we changed this to 2.

Noticed that deduplication is happening on client but the understanding was that this was happening on media agent:

DB2SBT log:
8126636 1 04/04 21:00:56 3472782 CPipelayer::InitiatePipeline signatureType [CV_SIGNATURE_SHA_512], signatureWhere [CV_CLIENTSIDE_DEDUP]

This is because (by default) storage policy has setting enabled to perform deduplication on clients.

That setting overrides the subclient setting to perform on Media agent as per the note in the subclient properties.

For testing purposes, disabled deduplication for the subclient to test - this will mimic deduplication not happening on the client.

The solution here could be to create a new storage policy and for clients where deduplication must happen on the media agent have the 'Enable Deduplication on Clients' setting disabled.

Encryption enabled for agent side (client). This will consume CPU cycles as well. For testing, changed this setting to 'Media Only'.

There are a couple of other factors:

Disabling Checksum (CRC) checking at the client side - Network CRC helps us detect corruption caused during network transfer but on some systems / processors this can consume alot of CPU cycles.

This can only be disabled at media agent level, see https://documentation.commvault.com/v11/expert/qscript/setMediaAgentProperty.html

It is possible to disable at client level however this would require an escalation to our development team to confirm CRC checking is actually the cause of high CPU usage.

Resource Control Groups - See https://documentation.commvault.com/commvault/v11_sp20/article?p=4954.htm another method to control / throttle CPU usage..
Target CPU usage of 40 - 50 % during their backups has now been achieved.

L

Userlevel 1

+5

Lukas_S
Author
Byte
17 replies
1 year ago
22 April 2022
Answer