Solved

Clean Orphan Data , what is it?

  • 24 February 2022
  • 2 replies
  • 1499 views

Badge +2

What is the Clean Orphan Data when running DDB Space Reclaim.

icon

Best answer by Damian Andre 25 February 2022, 05:23

View original

If you have a question or comment, please create a topic

2 replies

Userlevel 7
Badge +23

Hi @DaBackups , and welcome!

That phase is when we look for any blocks that are not referenced any longer to remove them, often as part of a Data Verification.

Reclaim idle space on Mount Paths

  • The Validate dedup data phase of the data verification job runs a quick verification on the deduplication database. For more information, see Quick Verification of the Deduplication Database (listed above in this page).

    The Data Verification jobs for DDB space reclamation is controlled by the Automatically submit Space Reclamation DDB Verification job when free space on Library is below this percent option in Media Management Configuration. For more information on this option, see Media Management Configuration: Service Configuration.

  • The Orphan Chunk Listing phase marks those blocks and chunks as orphan that are not referenced by any data block. This phase uses a single stream and if the data verification job is suspended during this phase, then on restarting the job, the listing phase also restarts.

  • The Defragment data phase, processes the files that can be defragmented (identified in the first phase) and the orphan chunks (identified in the Orphan Chunk Listing phase) by deleting the invalid or orphan data blocks, thereby reclaiming the unused space.

    Notes:

  • The Job Details dialog box displays the Estimated Completion time per phase when you run a DDB verification with Reclaim idle space on Mount Paths option.

  • The Job Details dialog box displays the Percent Complete per phase when you run a DDB verification with Reclaim idle space on Mount Paths option. Both the phases are 50% each.

Use this space reclamation option for disk mount paths that do not support sparse files.

Running a data verification job on the ddb with this option enabled defragments the data files that are identified (during the quick verification phase) with the unused space. The valid data blocks are retained. The invalid data blocks that are not being referred to by any backup jobs are deleted thereby reclaiming the unused storage space.

Reclamation Level: Use this slider to select the level of reclamation to be done. On the slider the numbers indicate the percentage of unused data blocks that can be defragmented.

  • 1 is equal to 80% (Least aggressive reclamation, low I/O on the disk)

  • 2 is equal to 60%

  • 3 is equal to 40%

  • 4 is equal to 20% (Most aggressive reclamation, higher I/O on the disk)

    For example: By default the slider is set at 3. This indicates that the data files that have 40% or more of invalid data blocks (unused space) and 60% or less of valid data blocks will be selected for defragmentation to reclaim the unused space.

    However, if you set the slider to 4, then the data files that have 20% or more of invalid data blocks (unused space) and 80% or less of valid data blocks will be selected for defragmentation. This will result in a very high I/O on the disk for reclaiming only 20% or more of the unused space. 

Full and Incremental Data Verification Job

https://documentation.commvault.com/11.26/expert/100399_data_verification_of_deduplicated_data.html

Userlevel 7
Badge +23

To explain a little more detail @DaBackups,

In some very edge case scenarios, its possible some data was no longer needed or referenced by the DDB, but pruning of that data failed for some reason. It could be that the Media Agent rebooted or services cycled at an inopportune time, or that the request was never properly processed by the Media Agent. In some very older versions of the software, a bug may also have caused pruning to be abandoned and data was left ‘orphaned’ on the disk that was no longer needed.

This option was added to go find data that is actually no longer valid and can be safely pruned. Its a separate operation because we scan the disk for the files that are no longer referenced by the DDB, rather than scanning the DDB to find the files to defrag - if that makes any sense.

There should be no need to run this frequently, but every so often it may catch some data which has been orphaned. I think its much of a rarity these days but can’t hurt to run periodically.