Solved

Oracle Compression savings

  • 3 November 2021
  • 21 replies
  • 794 views

Userlevel 3
Badge +7

Hi,

I have a question about compression.

We have a some oracle databases for which we create an online full every night. When we check the data transfered from the host to the media agent, we can see a much higher number than what is actually written to disk.

EDIT:

This is an example:

 

Deduplication seems to be working fine, and also compression. I know deduplication happens on the host/client side.

Does compression happen on the host or on the media agent? If it’s on the media agent, that would explain the high transferred data number, as it needs to read it from the host to compress it on the media agent.

I looked through books online, but couldn’t find a definitive answer. If anyone knows, I would appreciate it.

Jeremy

icon

Best answer by Mike Struening RETIRED 17 February 2022, 18:01

View original

21 replies

Userlevel 7
Badge +23

Compression happens on the client (except for Aux Copies).

How does the data transferred compare to the Application size?  Data written will be wildly different with too many variables.

Userlevel 3
Badge +7

Hi @Mike Struening,

Thanks for your answer.

I asked because the customer is seeing, according to them, an “Excessive amount of traffic” for databases that are deduplicated and compressed. So I’m trying to figure out where all that traffic is coming from when 90% of data is deduplicated on the client.
Any idea?

Userlevel 7
Badge +23

My pleasure, @Jeremy !

To confirm, you are seeing more data transferred over the network than what the job claims was sent over the network?

Looking at your screenshot, the App size is 162.56 GB, compressed down to 12.56 GB (at 88%) and 12.68 GB sent over the network.

That lines up, so it sounds like your customer is seeing more network sent than we are reporting?

Userlevel 3
Badge +7

Hi @Mike Struening 

That’s right. The customer is seeing more data transferred over the network than what the job claims was sent in the Job details.

In the above screenshot, you can see an avg throughput of about 40GB/hr, which results in the job taking over 4hrs to complete, to only write (and transfer) 12GB of data. (fyi this “low” throughput is due to a throttle)

When we remove the throttle, the avg throughput increases to about 200-250GB/hr. However, the data transferred (reported by commvault) and data written stay about the same, 12GB. But the job duration is cut short to about 40min. So it looks like much more data is being sent over the network, as increasing the thoughput, decreases the duration of the job. So we’re wondering what exactly is going over the network, other than the 12GB that Commvault reports.

Userlevel 6
Badge +14

Hi @Jeremy ,

 

As Mike said, the screenshot suggests less data was transferred over the network.

In the Subclient Properties “Storage Device” tab, what options are set for compression?

  • On Client
  • On MediaAgent
  • Use Storage Policy Settings

I’d also suggest checking the Storage Policy Config for compression here also.

 

Best Regards,

Michael

Userlevel 3
Badge +7

Hi @MichaelCapon ,

The option for software compression on the subclient is set to “Use Storage Policy Setting”.

 

 

As this subclient is using a deduplicated storage policy copy, i checked the Software Compression Settings on the Deduplication Engine under Storage Resources. (As per https://documentation.commvault.com/11.24/expert/109054_modifying_software_compression_on_deduplicated_storage_policy_copy.html)

 

The option Enable Software Compression with Deduplication is enabled:

 

So that’s why I wanted to make sure that Compression occurred on the client, instead of the Media Agent. So with Deduplication happening on the client, as well as Compression, I’m wondering what else is being transferred over the network.

Would it be possible that the “Read” part of the throughput is causing more network traffic? I’m just guessing at this point as I’m not sure where to look.

Jeremy

Userlevel 7
Badge +15

@Jeremy 

Because databases themselves are resource hungry, we often see the dedupe or compression workload offloaded to the MA. Sometimes, during backup the CPU usage for these tasks can spike higher and adversely affect the database or the application dependent on the database. Offloading these tasks to the MA is a good way to alleviate this resource pressure.

So it would be worth double checking in the configuration if dedupe or compression workload for this client has been to the MA, which would explain the network traffic.

Thanks,

Stuart

 

Userlevel 3
Badge +7

Hi @Stuart Painter 

I appreciate your feedback. From the logs, it looks like dedup and compression are happening on the client, but I could be wrong though.

However, going through the logs, I found something interesting in CVPerfMgr. This is from the latest Online Full from last night. As usual, the same amount of Data Transferred is mentioned in the Job details:

 

But in the logs for this job, I find this:

From this I gather between 22:02:13 and 23:52:57, a total of 152.61GB was transferred (“Read” in this case). Or I’m reading this all wrong?

Jeremy

Userlevel 7
Badge +23

That definitely looks like the full data load was sent over, so I suspect @Stuart Painter is correct that the MA is handling the actual load.

I’ll confer with one of my Oracle experts here to confirm.

Userlevel 3
Badge +7

Hi @Mike Struening 

Thanks. I’ll wait for your feedback from your Oracle experts.

Seems like a strange behaviour. I understand offloading the workload to the Media Agent if the databases are too resource hungry, but it would be helpful if the software is more transparent about that. From the job details, it just looks like less than 10% of data was actually transferred when in reality it’s actually much more.

Jeremy

Userlevel 3
Badge +7

Hi,

Just checking if the Oracle experts had some more information about this behaviour.

Thanks,

Jeremy

Userlevel 7
Badge +23

I’ll follow up.  My main expert was out of the office, so checking with some others.

Edit: Mahender replied below, he’s an outstanding resource.

Userlevel 3
Badge +5

Hi @Jeremy,


It appears the compression is set on the client because that's is our default option "for use storage policy" setting. You can also confirm the same by checking the compression setting under storage policy properties.

-Based on the screenshot, it appears that in the entire 92% savings, 88% was compression, and the remaining 4% was from deduplication.
-As Michael noticed, if the compression setting is on the client, we shouldn't see that much data was transferred over the network.

 

At this point, it is better if we have a case to review this issue further internally.

 

 

--
Mahender

Userlevel 3
Badge +7

Hi @Mahender Reddy,

Thanks for your input! As you suggested, I’ll open up a case at Commvault to address this issue. If anyone wants to follow up on this case, let me know and I’ll PM you the case number.

Jeremy

Userlevel 7
Badge +23

@Jeremy , I checked the case and it looks like it was closed with a recommendation:

Asked customer to cross check with Unix native tools like "sar"

Were you able to resolve this on your side?

Userlevel 3
Badge +7

@Mike Struening 

Unfortunately, we have not. I’m still waiting on more feedback from our customer.

I’ll update the post once I know more.

Userlevel 7
Badge +23

Appreciate the update.  I’ll keep an eye out!

Userlevel 7
Badge +23

Sharing the resolution:

Turns out the Media Agent option "Optimize for concurrent LAN backups" was disabled and it caused all kind of problems. Enabling it seems to have solved the issue.

Badge +4

I have similar question for a detailed picture of my transaction log backup job : since compression is 73% and savings is 72% would it mean that my dedupe savings is negative 1%? Should I be bypassing deduplication for these types of jobs?

Badge +4

Is there a way to see during the running of the job the deduplication savings detail. Seems we are killing ourselves here by trying to deduplicate something that is likely cannot be deduplicated? Does compression include any part of deduplication savings?

Userlevel 6
Badge +15

@Vitas Good afternoon.  Commmvault’s deduplication is “Content-Aware” meaning it knows what type of data is being protected in an operation so it can exclude any data type that historically has little to no savings when deduplicating.  Transaction logs are not deduplicated due to each being unique so there is no savings to be had.

 

For the compression part, this may help.  For object based agents we Compress/Hash (deduplication signature creation)/Encrypt.  For database agents we hash first then compress and for transaction logs we just  compress and then encrypt with no hashing due to deduplication being skipped for this content type.
 

 

Reply