Solved

Data Written vs Size on disk (HyperScale)

  • 16 February 2021
  • 7 replies
  • 1825 views

Badge +2


​​Size on disk 55,74 TB, but data written is 24,77 TB​​​​​​

Hi folks,

I’m trying to figure this out for few hours and I still didn’t find anything wrong in the Storage Policy, DiskLib, Media Agent properties... Backup jobs are also fine. I have counted 10800 jobs manually, just to be sure that the size is correct. 24,77 TB of data is written. BUT how it can be possible, that size on disk takes 55,74 TB?

Did someone had the same situation?

icon

Best answer by gjack 2 March 2021, 22:40

View original

7 replies

Userlevel 7
Badge +23

Hey @clariion , thanks for the question!

My initial thought is that this is not factoring in baseline files written for the Deduplicated jobs (that have since aged, but are in use by new jobs).

Few quick questions:

  1. What is your current Feature Release level?  There have been some issues addressed so I want to be sure you are on a newer Maintenance Release (and a supported Feature Release)
  2. Have you run a space reclamation lately?  Instructions are here.  This very well might be the issue and you might have some space coming your way.

Let me know about the above 2 items (and if it’s been a while for an update or space rec, address those 2 and let me know)!

Userlevel 7
Badge +23

Hey @clariion , thanks for the question!

My initial thought is that this is not factoring in baseline files written for the Deduplicated jobs (that have since aged, but are in use by new jobs).

 

 

Agree with Mike, I think that is likely what you are seeing.

What happens is that Job 1 runs and writes 10 GB of data. Job 2 runs, and writes 1 GB of data. At some point, Job 1 meets retention and gets pruned (deleted). Now Job 2, although only 'wrote’ 1 GB of data, still references some of the data in Job 1 thanks to the magic of deduplication.

So it still looks from the job view that we have 1 GB of data written but in reality, there is more data that was linked to other jobs. The deduplication database tracks the true data written - all the data that needs to be retained as it is still being referenced.

Badge +2

Thank you guys @Mike Struening & @Damian Andre :muscle_tone1:

I will inform you ASAP about findings.

Badge +2

Hey @clariion , thanks for the question!

My initial thought is that this is not factoring in baseline files written for the Deduplicated jobs (that have since aged, but are in use by new jobs).

Few quick questions:

  1. What is your current Feature Release level?  There have been some issues addressed so I want to be sure you are on a newer Maintenance Release (and a supported Feature Release)
  2. Have you run a space reclamation lately?  Instructions are here.  This very well might be the issue and you might have some space coming your way.

Let me know about the above 2 items (and if it’s been a while for an update or space rec, address those 2 and let me know)!

 

Hi Mike,

 

  1. We have 11 SP19.43 in use.
  1. Unfortunately there is no option with the right click on DDB to run reclamation. I have to run it via command. BUT the command is not the same.
    With the Commcell: Use the Reclamation Level slider to select the level of reclamation to be done. On the slider the numbers indicate the percentage of unused data blocks that can be defragmented

    and via command I have to set parameters, and this parameter is not like indicating unused data blocks that can be defragmented.

 

Example command:  qoperation execscript -sn DDBParam -si set -si 86 -si MaxNumOfAFsInSecondaryFile -si 4

paramName

Use MaxNumOfAFsInSecondaryFile value to reclaim space on the DDB disk.

paramValue

The number of in each secondary file of the DDB.

Range: 4 to 256. The value must be the power of 2.

dDefault: 16

Userlevel 7
Badge +23

Hey @clariion , thanks for the question!

My initial thought is that this is not factoring in baseline files written for the Deduplicated jobs (that have since aged, but are in use by new jobs).

Few quick questions:

  1. What is your current Feature Release level?  There have been some issues addressed so I want to be sure you are on a newer Maintenance Release (and a supported Feature Release)
  2. Have you run a space reclamation lately?  Instructions are here.  This very well might be the issue and you might have some space coming your way.

Let me know about the above 2 items (and if it’s been a while for an update or space rec, address those 2 and let me know)!

 

Hi Mike,

 

  1. and via command I have to set parameters, and this parameter is not like indicating unused data blocks that can be defragmented.

 

Example command:  qoperation execscript -sn DDBParam -si set -si 86 -si MaxNumOfAFsInSecondaryFile -si 4

paramName

Use MaxNumOfAFsInSecondaryFile value to reclaim space on the DDB disk.

paramValue

The number of in each secondary file of the DDB.

Range: 4 to 256. The value must be the power of 2.

dDefault: 16

 

Hold on there, those are not the same things. Space reclamation from the UI happens from the data verification option on the DDB. Here is the documentation: https://documentation.commvault.com/commvault/v11_sp19/article?p=12569.htm

You want the “Reclaim idle space on Mount Paths” option on the verification window. Note this should happen periodically and automatically for HyperScale, although I think it only kicks in after a certain amount of free space is used.

This command you quoted here regarding “MaxNumOfAFsInSecondaryFile” is to free up space on the storage location hosting the DDB, rather than the storage pool/library.

Userlevel 3
Badge +8

On the topic….

I am runnning Space Reclamation for a Hyperscale 1.5 with basically the same situation, filling up Mount Paths and no recent Space Reclamation automatically triggered or run by the System Created schedule (!?).

I am wondering about the option to select “Clean Orphan Data”. What is the difference if I select this? What is the difference between Orphan Data and Data ready to be Pruned?  Do I free up more space if I select Clean Orphan Data? 

Appreciate your feedback,

/Patrik 

Userlevel 2
Badge +4

Hello All,

With deduplication we cannot view this area to see data written vs data on disk.

 

As stated above, this only calculates new jobs data written and does not factor in aged baseline data.
If you look on https://documentation.commvault.com/commvault/v11_sp20/article?p=9385.htm it states:

Data Written

Net data written by backup jobs. With deduplication, this is the total data written by current valid jobs after deduplication savings.

Can you view the properties of the DDB (Deduplication Database) and check data written vs application size? https://documentation.commvault.com/commvault/v11_sp20/article?p=12538.htm

 

As for Hyperscale, yes space reclamation will need to be run, but this should run automatically once per month once a certain space threshold (50%) is hit.

 

Thank you,

Reply