Hello Commvault Community,
Today I come with a question about the Commvault deduplication mechanism.
We noticed that there are two deduplication base engines with identical values but differing in one parameter - unique blocks.
The difference between these engines is close to 1 billion unique blocks, where other values are almost identical to each other. Where could this difference come from? Is there any explainable reason why there is such a difference considering the rest of the parameters?
DASH Copy is enabled between the two deduplication database engines that are managed by different Media Agents.
Below I am sending examples from the other two DDB engines where the situation looks correct - the DASH Copy mechanism is also enabled.
I am asking for help in answering what may be caused by such differences in the number of unique blocks between DDB engines.
Another issue is whether, in the case of this deduplication database, we are in any way reducing the disk space? Currently, there is 17% of free space left. DDB Compacting and Garbage Collection enabled, suggested adding Partitions or adding extra storage space. Maybe there is some way to reduce the space or we can only add it Storage space - Seal is not an option due to the size of the DDB.
Thank you for your help.
Best answer by KamilView original
This one’s peaked my interest as a difference in Unique Blocks SHOULD come with a discrepancy in Data Written (as we have more unique blocks).
This is usually what we’d see where Dash Copies are run with many streams, as Mike mentioned, we may be sending two identical signatures on different streams so we end up writing both at the destination creating the discrepancy.
Yours are essentially identical in every way EXCEPT unique blocks.
If you have some logs, there is a log line within SIDBEngine.log which will show us the Unique Block count. Id like to match this up, as this will eliminate any GUI mismatch issue.
Should look something like:
With regard to the space question, if you refer to the disk space for the drive hosting the Dedupe Database itself, than adding a partition to another disk will eventually balance out the two partitions, however the larger partition will only start to shrink once job references age out. It doesnt balance out immediately, so if your retention is 30 days, the two partitions will look ‘similar’ (but not identical) after about 60 days.
DDB Compaction will help shrink the DDB Partition, though the largest impact is to compact the secondary records. This will take the longest but recover the most space, definitiely worth the investment if you can afford the downtime.
If talking about target storage where your Data is being written, than adding a partition will increase usage of foot print by approx 100TB (based on the ~200Tb from the screnshots) until the 60day mark when we can start to reclaim the references from the original partition.
Garbage Collection will help with reclaiming space from the target storage. It does not consolidate or compact things, but it will improve pruning efficiencies and should impact the performance.
Hopefully this makes sense!
Can you share a screenshot of the block size set for each DDB?
As Mike mentioned, the more unique blocks, the higher the primary record. And with a lower block size (128kb vs 64kb) we’ll see many more unique blocks.
Mike’s explanation with regard to stream count, would usually come with a duplicate unique chunk written and often we see a discrepancy on size at rest (this is what actually affects deplication savings, Physical Size vs Application Size.) but your savings are very near identical.
So block size may explain the difference between unique counts.
That’s very interesting. Size looks the same, prunable records, etc.
My initial thought is that you have more unique records on the Aux Copy because we don’t dedupe against concurrent streams, meaning if you are sending multiple streams at the same time, we won’t dedupe those streams against each other (at first).
Now, once they get written, subsequent streams will dedupe against the already written items and it eventually evens out from a space used perspective; however, you’ll still have an increased number of unique blocks (until they full age off).
An increase of what you are seeing is entirely possible.
It’s also possible that these DDBs are partitioned, and the Aux Partition was down for a prolonged period creating new primary records, In time, things should even out, though like the above, that will all depend on retention.
@Mike Struening and @jgeorges for your detailed answer.
So what can we do to make the number of blocks comparable / the same? Currently, the number of blocks varies significantly between DDBs.
Bielsko CVMA1> 2 536 242 896
Kety CVMA1> 1 763 738 506
The difference in efficiency of about 30% is a bit much for such a stabilized environment
I still have a question about the structure of the DDB. What are "Secondary Blocks" for? We have almost five times more blocks of this type than "Unique Blocks".
@Mike as you wrote about DDB Partitions, I think the Client has one partition in both cases.
I’ll answer the second question first
The secondary Records are the number of references to each of the Primary Records:
It makes perfect sense to have more Secondary Refs (you have to).
Now regarding the Unique/Primary discrepancy? In time, they should even out assuming it’s the combine streams issue. The more records you get written, the more likely they will be referenced, though there will always be a delta.
If you want to be 100% sure, I would suggest opening a support case and having someone deep dive into the records. If you do, share the case number here so I can track it!
I got an answer that cleared my doubts.
I think this news will teach us all an interesting fact about Commvault deduplication.
Reason for this is that in primary copy Signature is generated first for 128k data block, and then compressed, in secondary data is compressed first and then signature is generated, So multiple unique data blocks of primary copy are going to be a single unique data block in secondary causing less number of signatures in secondary. If you see the size of the unique blocks in primary it is almost close to what we see in secondary.
I tried to find an environment where a similar situation would be, but unfortunately I don’t have access to a similar environment at the moment. Maybe You Mike or someone else from the Commvault Community has and can verify if it really is so? :)
Thanks for help,
I had a look at the incident and will clarify for you to ensure there’s no confusion.
Firstly, since early version 10 days, all backups are performed in this order ‘Compression > Deduplication > Encryption’.
However, in more recent years we found with Database backups, as compression can cause very high rates of change to the dataset, we find better performance performing deduplication first (Deduplication > Compresssion > Encryption).
When we perform Dash Copy, we assume again that we are doing Compression > deduplication > Encryption and so when reading from Primary (regardless of IDA), we remove encryption and then perform signature generation.
With that, for this to explain your findings, you’d need to check that you’re doing a good amount of Database backups. If so, than this certainly explains it as the other key areas match up (size on disk and number of jobs).
If you still want to try and remove those discrepancies, you can look to make the two copies perform compression/deduplication in the same order:
Note, before you make these changes, you should understand that there will be NEW Signatures being generated resulting in more data being written down.
You need to ensure the destination copy has the space to allow this new data to be written down and for the DDB to grow.
New data being written will balance off and be reclaimed after the existing jobs meet retention.
That’s interesting for sure. At this point, I’d raise a support case, unless
@jgeorges has any more input.
Thats me exhausted of all ideas.
@Kamil if you IM me your CCID i can look to get a support case raised and have someone reach out to assist.
@Kamil ! Can you confirm this incident was created for this thread?
Thanks for the update. No need to apologize at all!
@Kamil , gentle follow up on this one.
Let me know if you were able to find a solution!
Thank you for further information on this matter.
Below I am sending screenshots you asked for, both are configured with 128 KB block.(screenblock1-2.png)
Don’t be mad, we’re all super busy these days!
I’ll track the case on my end
@Kamil , hope all is well! any word from the customer?
@Kamil ! Following up to see if you had a chance to work on this issue.
Ok, I’ll await your update!
@Mike Struening ,
Forgive me for not updating, I haven't done it yet. As soon as I have a free moment, I will deal with the escalation of this thread and let you know.
I am angry that I neglected it so much, but I had many more important matters that left the topic on the sidelines.
I have created a support application - Incident 211207-324.
I will inform you when I find out something, thanks for your understanding;)
In this case was the primary or the secondary showing more Unique blocks? I presume this is one way DASH not 2 active sites cross DASH?
just trying to make sure I know which side is likely to be bigger as we have some customers with large SQL DBs using TDE - we turn off CV encryption as the data is already encrypted so I guess its just the difference between compress and dedupe and dedupe and compress.
out of curiosity is there a BP KB for how to handle SQL with TDE/compression etc?
Normally if either is bigger, it’s the Aux Copy. This is because as we send simultaneous streams, we are not deduplicating those against each other….only the next set of streams against what is written.
Let me know if that clarifies,
@Karl Langston !
No, the incident number you provided is for a different problem.
I have in mind your recommendations to create an escalated case in the CV support, I am waiting for the client's answer what exactly questions should I ask to clarify the analysis of the problem.
When I get the information and create the application in the CV support, I will give you the number incident.