Solved

Dedup DB reconstruction job failed

  • 30 November 2021
  • 8 replies
  • 1787 views

Badge +1

Getting the following when CommVault attempts to reconstruct the DDB

 

  • User: Administrator
  • Job ID: 353464
  • Status: Failed
  • Storage Policy Name: BTR_Global_Dedupe
  • Copy Name: BTR_Global_Dedupe_Primary
  • Start Time: Sun Nov 28 17:33:46 2021
  • End Time: Tue Nov 30 03:45:22 2021
  • Error Code: [62:2035]
  • Failure Reason: One or more partitions of the active DDB for the storage policy copy is not available to use.

 

Is there any way around this error so I can get backups going again?  It has tried a couple of times for multiple days.

icon

Best answer by Mike Struening RETIRED 15 March 2022, 22:01

View original

8 replies

Userlevel 7
Badge +23

Hi @Dedup DB reconstruction failur !  (if you want me to change your user name, just send me a private message with the name you want).

You COULD seal the store, though that will result in a new baseline being created (and a lot more space used up).

What we should do first is look at the DDBRecovery.log on the Media Agent to see why the job is failing and see if we can address that instead.  Likely a much easier fix overall.

Userlevel 7
Badge +23

Hi @Luke Clark , following up to see if you ever got this resolved (or a case created).

Thanks!

Badge +1

Thanks Mike, I will hit you up in a PM to get my name adjusted.  Appreciate that.  I took a look at the logs and this is what I see for the failure last night.  It is reprocessing again but I expect it to fail again with the same error.  I checked and verified I can reach all the partitions:

 

4944  86c   11/30 03:32:26 353464 Reader [3] <Copy/Stream> Source <15/3>: Reporting SUCCESS to AuxcpyMgr, Err [0/0]. Chnk [6989717], bytes copied [4293963918]
4944  86c   11/30 03:32:26 353464 +++ AuxCopy Thread Params +++
4944  86c   11/30 03:32:26 353464  Storage Pol. [BTR_Global_Dedupe] ID [13]
4944  86c   11/30 03:32:26 353464  Src Copy     [1. Primary] ID [15] Is Dedup Copy [1]
4944  86c   11/30 03:32:26 353464  Src Stream   [3] MediaGrpId [856]
4944  86c   11/30 03:32:26 353464  Src DrvPoolId [1] Type [10001]
4944  86c   11/30 03:32:26 353464 +++ Src Chnk Info +++
4944  86c   11/30 03:32:26 353464  Src ChnkId [6989719], VolId [287104]
4944  86c   11/30 03:32:26 353464  CCId [2], File Mrkr No. [1929]
4944  86c   11/30 03:32:26 353464  Chnk Size Phy. [2548587826], Logical [2769008028]
4944  86c   11/30 03:32:26 353464  No. of ArchFiles [1]
4944  86c   11/30 03:32:26 353464  isSDM [0] isNASIndex [0]
4944  86c   11/30 03:32:26 353464 +++ Arch File Info +++
4944  86c   11/30 03:32:26 353464 1. AFileId [663022]
4944  86c   11/30 03:32:26 353464    Bkp JobId [349658], ClntId [0], iDATypeId [0], AppId [438]
4944  86c   11/30 03:32:26 353464    Log. Size [2769008028], Off. [4923352907951]
4944  86c   11/30 03:32:26 353464    Phy Size [2548587826], Off. [4139429171089]
4944  86c   11/30 03:32:26 353464    Chnk No. [965], Last Chnk No. [965]
4944  86c   11/30 03:32:26 353464    Enc. [0] SI [1] CV SI [1] CV SI WEB [0] Flg [23] XFlg [0]
4944  86c   11/30 03:32:26 353464      Enc. Type [0], Key Len [0]
4944  86c   11/30 03:32:26 353464      ReadStartoff - Phy. [4139429171089], Log. [4923352907951]
4944  86c   11/30 03:32:26 353464 %% Use Regular AUX COPY %%
4944  86c   11/30 03:32:26 353464 1193605-1161067 [DM_BASE    ] CloseChunkSpecific: [Total Physical bytes read - 0] 
4944  86c   11/30 03:32:26 353464 1193605-1161067 [MEDIAFS    ] CloseSpecific: Delayed Flush Queue Size [0], Busy Queue [0], Avg Flush Wait [0.000000] Sec(s), Samples [0]
4944  86c   11/30 03:32:26 353464 [DM_CHUNK   ] Got new Chunk Info. ChunkId [6989719], CommcellId [2], CopyId[15], VolumeId [287104], FileNumber [1929], NumberOfArchFiles [1]
4944  86c   11/30 03:32:26 353464 1193605-1161067 [DM_BASE    ] Opening the Chunk =6989719, ArchFileId = 663022, FileMarker=1929, ArchFilePhysSizeInChunk=2548587826 VolId=287104
4944  86c   11/30 03:32:26 353464 1193605-1161067 [MEDIAFS    ] ChunkMetaDataIndex File [C:\CV_Mounts\BTR_CV_DISKLIB_VG01_VOL01\CV_Backups\CV_MAGNETIC\V_287104\CHUNK_6989719\CHUNK_META_DATA_6989719.idx] Version [3]
4944  86c   11/30 03:32:26 353464 1193605-1161067 [MEDIAFS    ] SingleInstancer_FS Opened [C:\CV_Mounts\BTR_CV_DISKLIB_VG01_VOL01\CV_Backups\CV_MAGNETIC\V_287104\CHUNK_6989719\CHUNK_META_DATA_6989719] file
4944  86c   11/30 03:32:26 353464 Successfully opened the archive files on the media...going to read data.
4944  86c   11/30 03:32:26 353464 Entering [CvAuxCopy::SyncDedupDB] 
4944  86c   11/30 03:32:26 353464 1193605-1161067 [DM_BASE    ] Initialized DDB Link Reader [C:\CV_Mounts\BTR_CV_DISKLIB_VG01_VOL01\CV_Backups\CV_MAGNETIC\V_287104\CHUNK_6989719\CHUNK_META_DATA_6989719].
4944  86c   11/30 03:32:27 353464 1193605-1161067 [DM_BASE    ] DataMoverBase::GetDedupDBReconstructorLinks(): 0 bytes read from meta data file. EOF.
4944  86c   11/30 03:32:27 353464 Reader [3] <Copy/Stream> Source <15/3>: Reporting SUCCESS to AuxcpyMgr, Err [0/0]. Chnk [6989719], bytes copied [2548587826]
4944  86c   11/30 03:32:27 353464 Reader [3] <Copy/Stream> Source <15/3>: Going to wait on the queue semaphore.
4944  1e88  11/30 03:32:27 353464 Received FREE STREAM Request for readerId [3]
4944  1e88  11/30 03:32:27 353464 Reader [3] <Copy/Stream> Source <15/3>: Going to stop the auxcopy thread... 
4944  86c   11/30 03:32:27 353464 Reader [3] <Copy/Stream> Source <15/3>: AUXCOPY_REQUEST_TYPE_END_STREAM found in the command queue.
4944  86c   11/30 03:32:27 353464 Going to close the archive files on the media
4944  86c   11/30 03:32:27 353464 1193605-1161067 [DM_BASE    ] Going to close chunk for Aux Copy
4944  86c   11/30 03:32:27 353464 1193605-1161067 [DM_BASE    ] CloseChunkSpecific: [Total Physical bytes read - 0] 
4944  86c   11/30 03:32:27 353464 1193605-1161067 [MEDIAFS    ] CloseSpecific: Delayed Flush Queue Size [0], Busy Queue [0], Avg Flush Wait [0.000000] Sec(s), Samples [0]
4944  86c   11/30 03:32:27 353464 Going to close the archive file copies
4944  86c   11/30 03:32:27 353464 Going to bring down the auxcopy pipeline
4944  86c   11/30 03:32:27 353464 stat- ID [SI Link Insert Times], Job Id [353464], Samples [46133093], Time [101796.424675] Sec(s), Average [0.002207] Sec/Sample
4944  86c   11/30 03:32:27 353464 stat- ID [SI Link Read Times], Job Id [353464], Samples [46133093], Time [192.472686] Sec(s), Average [0.000004] Sec/Sample
4944  86c   11/30 03:32:27 353464 stat- ID [Media Read Speed], Job Id [353464], Bytes [14024460272], Time [192.452641] Sec(s), Average Speed [69.496410] MB/Sec
4944  86c   11/30 03:32:27 353464 stat- ID [Next chunk recv times], Job Id [353464], Samples [189], Time [10.307392] Sec(s), Average [0.054536] Sec/Sample
4944  86c   11/30 03:32:27 353464 stat- ID [Media Open Times], Job Id [353464], Samples [1861], Time [384.107960] Sec(s), Average [0.206399] Sec/Sample
4944  1e88  11/30 03:32:30 353464 Reader [3] <Copy/Stream> Source <15/3>: Reporting FREE_STREAM to AuxcpyMgr, Err [0/0]. Chnk [6989719], bytes copied [0]
4944  1e88  11/30 03:32:30 353464 Going to close the media handle of the source media 
4944  1e88  11/30 03:32:30 353464 1193605-1161067 [DM_BASE    ] Going to close chunk for Aux Copy
4944  1e88  11/30 03:32:30 353464 1193605-1161067 [DM_BASE    ] CloseChunkSpecific: [Total Physical bytes read - 0] 
4944  1e88  11/30 03:32:30 353464 1193605-1161067 [MEDIAFS    ] CloseSpecific: Delayed Flush Queue Size [0], Busy Queue [0], Avg Flush Wait [0.000000] Sec(s), Samples [0]
4944  1e88  11/30 03:32:30 353464 1193605-1161067 [MEDIAFS    ] Free Flush Queue Size [0]
4944  1e88  11/30 03:32:30 353464 1193605-1161067 [MEDIAFS    ] Data file cache queue size [1]
4944  1e88  11/30 03:32:30 353464 1193605-1161067 [DM_BASE    ] Releasing the volume and drive reservation on the media manager
4944  1e0c  11/30 03:32:39 353464 1193605-# [DM_BASE    ] ReadAhead: Received exit command.
4944  1e0c  11/30 03:32:39 353464 1193605-# [DM_BASE    ] ReadAhead: LookAhead thread exiting.
4944  1e88  11/30 03:32:39 353464 1193605-# [DM_BASE    ] Destroyed DataMoverBase for MediaGroupId=[856]
4944  1e88  11/30 03:32:39 353464 Received FREE STREAM Request for readerId [3]
4944  1e88  11/30 03:32:39 353464 Received DDB SYNC STATUS request

Userlevel 7
Badge +23

@Luke Clark , following up on this one.

There’s a thread that covers converting Transactional DDBs which might be the issue here:

 

Badge +1

As a side question to this, will it hurt to temporarily pause deduplication so we get backups?  We only have 10 servers still actively backed up in here.  Most are now in Azure.  We currently have 6 partitions each with around 28TB free on them so I am not too worried about space.

Userlevel 7
Badge +23

Agreed, nothing stands out here.  Next log to check then is SIDBEngline.log.  You should also be able to check the status of each Partition via the GUI (refresh first).  I also wonder if you have a transactional DDB (which can’t be reconstructed and needs to be converted),  If it is, SIDBEgnline.log will show MemDB as true.

Regarding your question about running jobs sans dedupe, you certainly can.  They’ll be large, but you need backups, so definitely do what you need to do.

Badge +1

All 6 partitions show as ready (I attached a pic).  The SIDBEngine.Log only appears to give me the last hour.  I went through all the ones I had (4:21PM ET is the first I see).  None of them showed the word true but perhaps I missed something.  I inherited this from someone that is no longer with us so I am trying to learn as I go.

Userlevel 7
Badge +23

This community is a great place to be if you’re new to the product.  Lots of great resources and peers willing to share!

Appreciate the log file.  It’s extremely long and only covers a short time period, so it’s possible that either the error is in a previous SUDBEngline.log (_1, _2, etc.) or it’s already rolled over.

At this point, I would suggest opening an incident as I do suspect you have a Transactional DDB (I do see some MemDB references, though I’m not 100% on the application to your situation without further inspection) and we can determine more through checking the database entries and more.

Could you open a support case and share the number with me so I can track and follow up accordingly?

Reply