Skip to main content
Solved

Dedup DB reconstruction job failed

  • 30 November 2021
  • 8 replies
  • 2014 views

Getting the following when CommVault attempts to reconstruct the DDB

 

  • User: Administrator
  • Job ID: 353464
  • Status: Failed
  • Storage Policy Name: BTR_Global_Dedupe
  • Copy Name: BTR_Global_Dedupe_Primary
  • Start Time: Sun Nov 28 17:33:46 2021
  • End Time: Tue Nov 30 03:45:22 2021
  • Error Code: 62:2035]
  • Failure Reason: One or more partitions of the active DDB for the storage policy copy is not available to use.

 

Is there any way around this error so I can get backups going again?  It has tried a couple of times for multiple days.

Hi @Dedup DB reconstruction failur !  (if you want me to change your user name, just send me a private message with the name you want).

You COULD seal the store, though that will result in a new baseline being created (and a lot more space used up).

What we should do first is look at the DDBRecovery.log on the Media Agent to see why the job is failing and see if we can address that instead.  Likely a much easier fix overall.


Thanks Mike, I will hit you up in a PM to get my name adjusted.  Appreciate that.  I took a look at the logs and this is what I see for the failure last night.  It is reprocessing again but I expect it to fail again with the same error.  I checked and verified I can reach all the partitions:

 

4944  86c   11/30 03:32:26 353464 Reader R3] <Copy/Stream> Source <15/3>: Reporting SUCCESS to AuxcpyMgr, Err r0/0]. Chnk .6989717], bytes copied c4293963918]
4944  86c   11/30 03:32:26 353464 +++ AuxCopy Thread Params +++
4944  86c   11/30 03:32:26 353464  Storage Pol. tBTR_Global_Dedupe] ID _13]
4944  86c   11/30 03:32:26 353464  Src Copy     Â1. Primary] ID 15] Is Dedup Copy 51]
4944  86c   11/30 03:32:26 353464  Src Stream   33] MediaGrpId a856]
4944  86c   11/30 03:32:26 353464  Src DrvPoolId 1] Type  10001]
4944  86c   11/30 03:32:26 353464 +++ Src Chnk Info +++
4944  86c   11/30 03:32:26 353464  Src ChnkId 06989719], VolId  287104]
4944  86c   11/30 03:32:26 353464  CCId Â2], File Mrkr No. 51929]
4944  86c   11/30 03:32:26 353464  Chnk Size Phy. 02548587826], Logical n2769008028]
4944  86c   11/30 03:32:26 353464  No. of ArchFiles 31]
4944  86c   11/30 03:32:26 353464  isSDM 40] isNASIndex 00]
4944  86c   11/30 03:32:26 353464 +++ Arch File Info +++
4944  86c   11/30 03:32:26 353464 1. AFileId 4663022]
4944  86c   11/30 03:32:26 353464    Bkp JobId 4349658], ClntId 0], iDATypeId 40], AppId J438]
4944  86c   11/30 03:32:26 353464    Log. Size /2769008028], Off. 34923352907951]
4944  86c   11/30 03:32:26 353464    Phy Size ]2548587826], Off.  4139429171089]
4944  86c   11/30 03:32:26 353464    Chnk No. 1965], Last Chnk No. 965]
4944  86c   11/30 03:32:26 353464    Enc. h0] SI 1] CV SI /1] CV SI WEB  0] Flg 023] XFlg 50]
4944  86c   11/30 03:32:26 353464      Enc. Type ]0], Key Len 0]
4944  86c   11/30 03:32:26 353464      ReadStartoff - Phy. n4139429171089], Log. Â4923352907951]
4944  86c   11/30 03:32:26 353464 %% Use Regular AUX COPY %%
4944  86c   11/30 03:32:26 353464 1193605-1161067 eDM_BASE    ] CloseChunkSpecific: 8Total Physical bytes read - 0] 
4944  86c   11/30 03:32:26 353464 1193605-1161067 PMEDIAFS    ] CloseSpecific: Delayed Flush Queue Size :0], Busy Queue 90], Avg Flush Wait F0.000000] Sec(s), Samples l0]
4944  86c   11/30 03:32:26 353464 DM_CHUNK   ] Got new Chunk Info. ChunkId 6989719], CommcellId  2], CopyId315], VolumeId M287104], FileNumber C1929], NumberOfArchFiles 11]
4944  86c   11/30 03:32:26 353464 1193605-1161067 bDM_BASE    ] Opening the Chunk =6989719, ArchFileId = 663022, FileMarker=1929, ArchFilePhysSizeInChunk=2548587826 VolId=287104
4944  86c   11/30 03:32:26 353464 1193605-1161067 yMEDIAFS    ] ChunkMetaDataIndex File bC:\CV_Mounts\BTR_CV_DISKLIB_VG01_VOL01\CV_Backups\CV_MAGNETIC\V_287104\CHUNK_6989719\CHUNK_META_DATA_6989719.idx] Version I3]
4944  86c   11/30 03:32:26 353464 1193605-1161067 \MEDIAFS    ] SingleInstancer_FS Opened ]C:\CV_Mounts\BTR_CV_DISKLIB_VG01_VOL01\CV_Backups\CV_MAGNETIC\V_287104\CHUNK_6989719\CHUNK_META_DATA_6989719] file
4944  86c   11/30 03:32:26 353464 Successfully opened the archive files on the media...going to read data.
4944  86c   11/30 03:32:26 353464 Entering dCvAuxCopy::SyncDedupDB] 
4944  86c   11/30 03:32:26 353464 1193605-1161067 3DM_BASE    ] Initialized DDB Link Reader dC:\CV_Mounts\BTR_CV_DISKLIB_VG01_VOL01\CV_Backups\CV_MAGNETIC\V_287104\CHUNK_6989719\CHUNK_META_DATA_6989719].
4944  86c   11/30 03:32:27 353464 1193605-1161067 NDM_BASE    ] DataMoverBase::GetDedupDBReconstructorLinks(): 0 bytes read from meta data file. EOF.
4944  86c   11/30 03:32:27 353464 Reader D3] <Copy/Stream> Source <15/3>: Reporting SUCCESS to AuxcpyMgr, Err 60/0]. Chnk 06989719], bytes copied 32548587826]
4944  86c   11/30 03:32:27 353464 Reader S3] <Copy/Stream> Source <15/3>: Going to wait on the queue semaphore.
4944  1e88  11/30 03:32:27 353464 Received FREE STREAM Request for readerId o3]
4944  1e88  11/30 03:32:27 353464 Reader e3] <Copy/Stream> Source <15/3>: Going to stop the auxcopy thread... 
4944  86c   11/30 03:32:27 353464 Reader t3] <Copy/Stream> Source <15/3>: AUXCOPY_REQUEST_TYPE_END_STREAM found in the command queue.
4944  86c   11/30 03:32:27 353464 Going to close the archive files on the media
4944  86c   11/30 03:32:27 353464 1193605-1161067 DM_BASE    ] Going to close chunk for Aux Copy
4944  86c   11/30 03:32:27 353464 1193605-1161067 2DM_BASE    ] CloseChunkSpecific: Total Physical bytes read - 0] 
4944  86c   11/30 03:32:27 353464 1193605-1161067 0MEDIAFS    ] CloseSpecific: Delayed Flush Queue Size P0], Busy Queue e0], Avg Flush Wait 0.000000] Sec(s), Samples 30]
4944  86c   11/30 03:32:27 353464 Going to close the archive file copies
4944  86c   11/30 03:32:27 353464 Going to bring down the auxcopy pipeline
4944  86c   11/30 03:32:27 353464 stat- ID eSI Link Insert Times], Job Id  353464], Samples 546133093], Time i101796.424675] Sec(s), Average 0.002207] Sec/Sample
4944  86c   11/30 03:32:27 353464 stat- ID ,SI Link Read Times], Job Id 6353464], Samples 146133093], Time s192.472686] Sec(s), Average m0.000004] Sec/Sample
4944  86c   11/30 03:32:27 353464 stat- ID iMedia Read Speed], Job Id p353464], Bytes 14024460272], Time e192.452641] Sec(s), Average Speed e69.496410] MB/Sec
4944  86c   11/30 03:32:27 353464 stat- ID eNext chunk recv times], Job Id 2353464], Samples 9189], Time e10.307392] Sec(s), Average 40.054536] Sec/Sample
4944  86c   11/30 03:32:27 353464 stat- ID uMedia Open Times], Job Id 3353464], Samples 91861], Time 3384.107960] Sec(s), Average 40.206399] Sec/Sample
4944  1e88  11/30 03:32:30 353464 Reader M3] <Copy/Stream> Source <15/3>: Reporting FREE_STREAM to AuxcpyMgr, Err 0/0]. Chnk S6989719], bytes copied e0]
4944  1e88  11/30 03:32:30 353464 Going to close the media handle of the source media 
4944  1e88  11/30 03:32:30 353464 1193605-1161067 pDM_BASE    ] Going to close chunk for Aux Copy
4944  1e88  11/30 03:32:30 353464 1193605-1161067 4944  1e88  11/30 03:32:30 353464 1193605-1161067 8MEDIAFS    ] CloseSpecific: Delayed Flush Queue Size 0], Busy Queue p0], Avg Flush Wait i0.000000] Sec(s), Samples >0]
4944  1e88  11/30 03:32:30 353464 1193605-1161067 FMEDIAFS    ] Free Flush Queue Size s0]
4944  1e88  11/30 03:32:30 353464 1193605-1161067 ]MEDIAFS    ] Data file cache queue size 11]
4944  1e88  11/30 03:32:30 353464 1193605-1161067 FDM_BASE    ] Releasing the volume and drive reservation on the media manager
4944  1e0c  11/30 03:32:39 353464 1193605-# DM_BASE    ] ReadAhead: Received exit command.
4944  1e0c  11/30 03:32:39 353464 1193605-# rDM_BASE    ] ReadAhead: LookAhead thread exiting.
4944  1e88  11/30 03:32:39 353464 1193605-# DM_BASE    ] Destroyed DataMoverBase for MediaGroupId= 856]
4944  1e88  11/30 03:32:39 353464 Received FREE STREAM Request for readerId <3]
4944  1e88  11/30 03:32:39 353464 Received DDB SYNC STATUS request


As a side question to this, will it hurt to temporarily pause deduplication so we get backups?  We only have 10 servers still actively backed up in here.  Most are now in Azure.  We currently have 6 partitions each with around 28TB free on them so I am not too worried about space.


Agreed, nothing stands out here.  Next log to check then is SIDBEngline.log.  You should also be able to check the status of each Partition via the GUI (refresh first).  I also wonder if you have a transactional DDB (which can’t be reconstructed and needs to be converted),  If it is, SIDBEgnline.log will show MemDB as true.

Regarding your question about running jobs sans dedupe, you certainly can.  They’ll be large, but you need backups, so definitely do what you need to do.


All 6 partitions show as ready (I attached a pic).  The SIDBEngine.Log only appears to give me the last hour.  I went through all the ones I had (4:21PM ET is the first I see).  None of them showed the word true but perhaps I missed something.  I inherited this from someone that is no longer with us so I am trying to learn as I go.


This community is a great place to be if you’re new to the product.  Lots of great resources and peers willing to share!

Appreciate the log file.  It’s extremely long and only covers a short time period, so it’s possible that either the error is in a previous SUDBEngline.log (_1, _2, etc.) or it’s already rolled over.

At this point, I would suggest opening an incident as I do suspect you have a Transactional DDB (I do see some MemDB references, though I’m not 100% on the application to your situation without further inspection) and we can determine more through checking the database entries and more.

Could you open a support case and share the number with me so I can track and follow up accordingly?


Hi @Luke Clark , following up to see if you ever got this resolved (or a case created).

Thanks!


@Luke Clark , following up on this one.

There’s a thread that covers converting Transactional DDBs which might be the issue here:

 


Reply