Solved

DDB corrupted on Windows server and sealed DDBs


Badge +2

Hello,

 

 I would like to request your help as i’m quite new and have only basic knowledge of backup systems.

After reading the commvault documentation available online and tried to troubleshoot the issues, they were narrowed down to 3:

 

  1. Sealed DDB are not aging out.
  2. When I try to run a verification of existing jobs on disk and deduplication database it says DDB is corrupted.
  3. we have our disks full, as the backups all have stopped.

When i run the data retention forecast and compliance report it says “BASIC CYCLE” as reason why jobs are not aged out, i have this dedup policy to age out jobs after 1 cycle, so i guess that if i do a full backup, the previous will be aged out.. except i dont have any available disk space.

 

Also I was unable to find a DDB backup as it seems there was never one to begin with.

 

Should i reconstruct one new from the group up?

 

How can i reduce the size of sealed DDBs as they are quite old.

icon

Best answer by Mike Struening 19 April 2021, 16:49

@Francisco_Vasconcelos , the answer @Anthony.Hodges gave was very accurate (as usual)!

Before Deduplication, Data aging would free space pretty much instantly.  With dedupe, there are several pieces involved before blocks qualify for deletion.

Check the Media Agent log files for SIDPrune (I don’t remember if SIDBPhysicalDelete.log existed in v10).

The pruning will likely be very slow (compared to non dedupe).

You can generally ignore the v10 ‘space to clear’ reporting as it was based on non-dedupe expectations.

The .PRUNABLE generally means something will be deleted in that folder if not the entire thing.  I would suspend any jobs that are running to this library as well as running jobs will hog up the DDB related resources (things are FAR BETTER in v11).

Lastly, I would consider upgrading to v11 if possible.  SO MUCH BETTER!

View original

12 replies

Userlevel 6
Badge +14

@Francisco_Vasconcelos , welcome!  You are in the right place to ask!

If I may, I’d like to rewind a bit and see what the overall issue is that you are experiencing.

Before we look into removing jobs/corrupt stores, I want to ask if you have automatic construction turned on?  Or have attempted a recon.  It may be failing because the disk is full (and there’s no room for the process).

https://documentation.commvault.com/commvault/v11_sp20/article?p=12503.htm

If you don’t already have a backup, then you’ll have to proceed below, but if you can get a recon to work, you’ll be much better off.

Assuming you can’t get a recon to work, you mention corrupt/sealed DDBs as well as BASIC CYCLES as part of the report.

It sounds like (and please correct me if I am missing anything) that you are out of space on your Disk Library, and that you have a corrupt store (likely the ddb that tracks the jobs using this same library.  Checking the jobs for that store/storage policy, you are seeing several jobs showing BASIC CYCLES as their reason as not pruning.

Is this all correct?  I noticed you mentioned the size of the DDB, though I want to be sure we are differentiating between the actual Deduplication Database vs. the amount of disk space the library files are using.  I’m going to assume you are concerned with the latter, though please correct me if I am mistaken.

If so, then the most likely issue right now is that you do indeed have a corrupt store and those jobs are the ones holding onto ALL of the space.  Corrupt stores can’t micro prune (meaning, delete individual referenced blocks) so nothing is deleted until ALL of the jobs logically age, then the whole store drops off in a macro prune.

Without checking which jobs are in the store and confirming the details, I can’t say this is the case for sure, but it sounds like you’ve looked into it and came to this conclusion.  the Data Retention forecast and Compliance Report will show which Store Id the jobs belong to, so you should be able to determine if that is the same Store ID.

If you can’t run new jobs to prune off those cycles, you can lower your cycle Count *(even to 0) temporarily to get them to age off.  If you have an Aux copy, then the job will potentially still exist there.  Of course, if the job is older than your expected retention, then logically you wouldn’t have expected it to be there anyway.

What is your retention set to in Days?  what about your Cycle count?

 

Userlevel 6
Badge +14

Here’s a quick thing to check.  Try this and let’s see what happens.  It’s far safer to troubleshoot this than to delete jobs which is honestly the last thing we should do.

Of course, if you can add space to that volume, that will likely help this process run.

https://documentation.commvault.com/commvault/v11_sp20/article?p=59039.htm

Marking the Deduplication Database for Recovery

You might need to mark the partition of the DDB for recovery in the following situations:

  • When the drive hosting the DDB is lost or the DDB files are missing and the DDB is recovered by the DDB Move operation.

Procedure

  1. From the CommCell Browser, expand Storage Resources > Deduplication Engines > storage_policy > deduplication_database.
  2. Right-click the appropriate partition, and then click Mark for Recovery.

    A message appears that asks you to mark this partition for recovery.

  3. Click Yes.

    A message appears that tells you that the partition is successfully set for recovery.

  4. Click OK.
Badge +2

@Mike Struening Thanks for the answers! :grinning:

 

I’ve marked for recovery the DDB and no job was launched (maybe because there is no disk space left), after checking the event viewer i noticed this message: “Deduplication store [SIDB_sitename] has been sealed. Sealed Reason: Store is Force Sealed" with event code [32:370]

I only found it is corrupted, because when i run a data verification on the DD Store, the job doesn’t run with the reason: “Error Code: [62:2483]
Description: Unable to run Data Verification job because One or more partitions of the deduplication store is corrupted.”

The i have 3 sealed (status:offline) and 1 active DDB (status:ready), all of them have corrupted partitions as when i try to run data verification job, the error mentioned previously happens [62:2483].

 

In the active DDB it shows this:

total application size: 280.2 GB

Total data size on disk: 91.97 GB

 

This Dedup policy has 6 days of retention and 1 cycle and no extention retention rules applied.

 

I searched for DDB prune process, and i must wait for the data age to age out naturally the jobs from the DDB, but that will take at least 6 days OR another full backup to increase the cycle count, and in the meanwhile no backups are running, am I reading this situation right?

I could set the retention policy to 0 cycles so it gets pruned and the get back to 1 cycle?

 

When you say a recon, you mean a reconstruction? Is that possible if i have no disk space left on the disk library?

 

Userlevel 6
Badge +14

At this point, I don’t think a recon will work for you, and if your retention is that low and the jobs are showing only BASIC CYCLES, then they should have pruned by now anyway.

A nice trick I use is to set the cycles to 0 (this works for any retention change you are considering), then run the Data Retention Forecast and Compliance report.  If you like what you see as far as which jobs will age and which will remain, then go ahead and kick off data aging.

If you DON’T like it, then change it back.  Just be sure there’s no chance Data Aging can run in between the change and change back!

In your case, that’s probably your best path forward.

Now, AFTER that is resolved, I would definitely take a look at your space and your expected data usage.  6 days and 1 cycle is not a lot of retention at all, and you filled up with one corrupt store.  

That’s a dangerously close place to be and we don’t want a next time!

Badge +2

Thanks again for the quick answer @Mike Struening 

 

But if I set the cycles retention to 0, wont I lose full backups that are over 6 days of age?

I think they are stored in tapes by running auxiliary job, so i can probably restore if needed...

Userlevel 6
Badge +14

That’s likely correct.  You can confirm what will age by running the DRFC report right after you make the change (before running Data Aging).

If you have a synchronous Aux copy, you should be good (the jobs won’t age id they haven’t been copied yet).

Badge +2

There is something I dont quite understand… I’ve set the retention rule for 6days 0 cycles and ran the DRFC, it said it would clear a big chunk of space, but it didnt, why was that? I tried to run a data aging as well, nothing changed…

 

Will the disk space only be freed after a full backiup? i think it makes no sense, since it is now 0 cycles retention

Userlevel 4
Badge +9

The process of freeing up disk space lags behind Data Aging because Commvault uses a distributed approach to identify deduplicated data to be physically pruned.  The Job/Object Level Data Aging is done on the CommServe SQL Server DB, and the Commserve will communicate to the MediaAgent SIDBEngine regarding the status of aged deduplicated jobs/objects.  Blocks held in the Primary (Single Instance) table that have been aged get picked up by the SIDBPrune process that moves them into the Zero Reference table.  The SIDBPhysicalDeletes is the process that is used to physically delete the storage blocks. 

 

If you want to know more, this is a superb resource. https://commvaultondemand.atlassian.net/wiki/spaces/ODLL/pages/1479215216/Aging+and+Pruning+Process

 

 

Badge +2

Hello @Anthony.Hodges thanks for sharing that helpful page, but I just need to clear disk space and I’ve noticed that I am using the V10 not the V11, hence  the option to reclaim idle space or list the IDs of jobs in DDB isn’t available. I feel Commvault is an awsome system but very complex!

After reading the linked resource, I noticed this:

Why is My Data Not Aging

  • Mixed retention on tape
  • Failing Jobs
  • Unscheduled Backups
  • Auxiliary copies not running or not completing
  • Deconfigured Clients

 

I just need to make sure these conditions are met and the physical pruning of aged data will occur, right?

 

I feel the jobs are failing because there is no free space, and I cant prune because the jobs are not getting completed… Seems like a vicious cycle, even after configuring the policy to nº of cycles to retention= 0

Badge +2

Does the .prunable on CVMAGNETIC folder indicate that a directory can be removed?
Is there a standard commvault process to remove these?

I’ve not seen any mention of these files extention on commvault’s documentation.
 

 

Userlevel 6
Badge +14

@Francisco_Vasconcelos , the answer @Anthony.Hodges gave was very accurate (as usual)!

Before Deduplication, Data aging would free space pretty much instantly.  With dedupe, there are several pieces involved before blocks qualify for deletion.

Check the Media Agent log files for SIDPrune (I don’t remember if SIDBPhysicalDelete.log existed in v10).

The pruning will likely be very slow (compared to non dedupe).

You can generally ignore the v10 ‘space to clear’ reporting as it was based on non-dedupe expectations.

The .PRUNABLE generally means something will be deleted in that folder if not the entire thing.  I would suspend any jobs that are running to this library as well as running jobs will hog up the DDB related resources (things are FAR BETTER in v11).

Lastly, I would consider upgrading to v11 if possible.  SO MUCH BETTER!

Userlevel 4
Badge +9

@Francisco_Vasconcelos  One thing to note about the pruning process is that Commvault is reliant on there being sufficient quiet time on the chunk data files to be able to conduct a Micro Pruning (“holes in files”) of the files that hold the deduplicated data.  If Commvault didn’t use chunks, then there would be a massive number of deduplicated blocks residing on the file system. Whilst that would make for very simple data pruning for cases such as yours, it would end up being a huge drain on MediaAgent compute.  This is because whilst File Systems such as NTFS/ext4 can efficiently handle numerous files, the API/programs that rely on iterating through through the file system tables really struggle.  Simplistically, I would say that physics has lead Commvault towards tuning is more balanced towards writes, reads and a moderate pruning schedule over an aggressive data pruning.

 

That said if in the time since you adjusted retention that you are not seeing any data reclaimed on storage libraries, and your Data Forecast Report is shown that you should be reclaiming space then (if you are using Commvault Simpana and not just DDB that is V10), the log files that would be used to troubleshoot are SIDBEngine, SIDBPruning and CVMA.  The CVMA log will inform you on the progress of physical deletes.

Reply