Question

The difference in the number of unique blocks in deduplication databases

  • 28 September 2021
  • 15 replies
  • 159 views

Userlevel 3
Badge +8

Hello Commvault Community, 

 

Today I come with a question about the Commvault deduplication mechanism.

 

We noticed that there are two deduplication base engines with identical values but differing in one parameter - unique blocks.

(engine1.png)

(engine2.png)

The difference between these engines is close to 1 billion unique blocks, where other values are almost identical to each other. Where could this difference come from? Is there any explainable reason why there is such a difference considering the rest of the parameters?

 

DASH Copy is enabled between the two deduplication database engines that are managed by different Media Agents.

Below I am sending examples from the other two DDB engines where the situation looks correct - the DASH Copy mechanism is also enabled.

(otherengine1.png)
(otherengine2.png)

I am asking for help in answering what may be caused by such differences in the number of unique blocks between DDB engines.
---

Another issue is whether, in the case of this deduplication database, we are in any way reducing the disk space? Currently, there is 17% of free space left. DDB Compacting and Garbage Collection enabled, suggested adding Partitions or adding extra storage space. Maybe there is some way to reduce the space or we can only add it Storage space - Seal is not an option due to the size of the DDB.

(ddb1.png)
(ddb2.png)
(ddb3.png)

 

Thank you for your help.

Regards,
Kamil


15 replies

Userlevel 7
Badge +23

That’s very interesting.  Size looks the same, prunable records, etc.

My initial thought is that you have more unique records on the Aux Copy because we don’t dedupe against concurrent streams, meaning if you are sending multiple streams at the same time, we won’t dedupe those streams against each other (at first).

Now, once they get written, subsequent streams will dedupe against the already written items and it eventually evens out from a space used perspective; however, you’ll still have an increased number of unique blocks (until they full age off).

An increase of what you are seeing is entirely possible.

It’s also possible that these DDBs are partitioned, and the Aux Partition was down for a prolonged period creating new primary records,  In time, things should even out, though like the above, that will all depend on retention.

 

Userlevel 4
Badge +7

Hi @Kamil 

This one’s peaked my interest as a difference in Unique Blocks SHOULD come with a discrepancy in Data Written (as we have more unique blocks).
This is usually what we’d see where Dash Copies are run with many streams, as Mike mentioned, we may be sending two identical signatures on different streams so we end up writing both at the destination creating the discrepancy.


Yours are essentially identical in every way EXCEPT unique blocks.
If you have some logs, there is a log line within SIDBEngine.log which will show us the Unique Block count. Id like to match this up, as this will eliminate any GUI mismatch issue.
Should look something like:

10280 3490  06/04 19:47:23 ### 3-0-4-0  LogCtrs          6002  [0][     Total]  Primary [3155589359]-[131676182690390]-[0]-[0]-[0]-[0],

 

With regard to the space question, if you refer to the disk space for the drive hosting the Dedupe Database itself, than adding a partition to another disk will eventually balance out the two partitions, however the larger partition will only start to shrink once job references age out. It doesnt balance out immediately, so if your retention is 30 days, the two partitions will look ‘similar’ (but not identical) after about 60 days.

DDB Compaction will help shrink the DDB Partition, though the largest impact is to compact the secondary records. This will take the longest but recover the most space, definitiely worth the investment if you can afford the downtime. 

If talking about target storage where your Data is being written, than adding a partition will increase usage of foot print by approx 100TB (based on the ~200Tb from the screnshots) until the 60day mark when we can start to reclaim the references from the original partition.

Garbage Collection will help with reclaiming space from the target storage. It does not consolidate or compact things, but it will improve pruning efficiencies and should impact the performance. 

 

Hopefully this makes sense!


Cheers,
Jase

 

Userlevel 3
Badge +8

Thank you @Mike Struening and @jgeorges  for your detailed answer.

 

So what can we do to make the number of blocks comparable / the same? Currently, the number of blocks varies significantly between DDBs.

 

Bielsko CVMA1> 2 536 242 896
Kety CVMA1> 1 763 738 506

 

The difference in efficiency of about 30% is a bit much for such a stabilized environment

I still have a question about the structure of the DDB. What are "Secondary Blocks" for? We have almost five times more blocks of this type than "Unique Blocks".

 

@Mike as you wrote about DDB Partitions, I think the Client has one partition in both cases.

 

Regards,
Kamil

Userlevel 7
Badge +23

I’ll answer the second question first :grin:

The secondary Records are the number of references to each of the Primary Records:

  • Primary Records - Actual unique blocks
  • Secondary Records - How many Job References exist per Primary Record
  • Zero Ref - Primary Records with 0 Secondary Records (these entries get sent to be deleted)

It makes perfect sense to have more Secondary Refs (you have to).

Now regarding the Unique/Primary discrepancy?  In time, they should even out assuming it’s the combine streams issue.  The more records you get written, the more likely they will be referenced, though there will always be a delta.

If you want to be 100% sure, I would suggest opening a support case and having someone deep dive into the records.  If you do, share the case number here so I can track it!

Userlevel 4
Badge +7

@Kamil  I had a thought last night but couldnt drag myself out of bed to respond here.

 

Can you share a screenshot of the block size set for each DDB?
https://documentation.commvault.com/11.24/expert/12471_modifying_properties_of_global_deduplication_policy.html

As Mike mentioned, the more unique blocks, the higher the primary record. And with a lower block size (128kb vs 64kb) we’ll see many more unique blocks.

 

Mike’s explanation with regard to stream count, would usually come with a duplicate unique chunk written and often we see a discrepancy on size at rest (this is what actually affects deplication savings, Physical Size vs Application Size.) but your savings are very near identical.

So block size may explain the difference between unique counts.

 

Cheers,

Jase

 

 

Userlevel 3
Badge +8

Thank you for further information on this matter.

 

Below I am sending screenshots you asked for, both are configured with 128 KB block.(screenblock1-2.png)

 

Thanks,
Kamil

Userlevel 7
Badge +23

That’s interesting for sure.  At this point, I’d raise a support case, unless @jgeorges has any more input.

Userlevel 4
Badge +7

@Kamil @Mike Struening 

 

Thats me exhausted of all ideas. 
@Kamil if you IM me your CCID i can look to get a support case raised and have someone reach out to assist.

 

Cheers,

Jase

Userlevel 7
Badge +23

Hi @Kamil !  Can you confirm this incident was created for this thread?

211014-168

Thanks!

Userlevel 3
Badge +8

Hello @Mike Struening 

 

No, the incident number you provided is for a different problem.

 

I have in mind your recommendations to create an escalated case in the CV support, I am waiting for the client's answer what exactly questions should I ask to clarify the analysis of the problem.

 

When I get the information and create the application in the CV support, I will give you the number incident. 

 

Thanks,
Kamil

Userlevel 7
Badge +23

Ok, I’ll await your update!

Userlevel 7
Badge +23

Hi @Kamil , hope all is well!  any word from the customer?

Thanks!

Userlevel 3
Badge +8

Hi @Mike Struening ,

 

Forgive me for not updating, I haven't done it yet. As soon as I have a free moment, I will deal with the escalation of this thread and let you know.

 

Regards,
Kamil

Userlevel 7
Badge +23

Thanks for the update.  No need to apologize at all!

Userlevel 7
Badge +23

Hi @Kamil !  Following up to see if you had a chance to work on this issue.

Thanks!

Reply