Solved

Failing DDB Backup on a Linux-based MA due to thin pool/volume threshold

  • 8 January 2021
  • 3 replies
  • 1513 views

Userlevel 3
Badge +4

Hi Team, 

My DDB Backup operations are failing with the following error message: Snap creation failed on volumes holding DDB paths. 

A quick review of the job logs points to some sort of free space threshold being reached. What could this mean? 

Regards

Winston 

icon

Best answer by Jon Vengust 8 January 2021, 02:28

View original

3 replies

Userlevel 3
Badge +5

Identifying this issue may be somewhat hard on face value as the error appears generic when reported to our Job Manager service (the error you see within the Commcell Console).

 

This error is occurring at the MediaAgent level and as such, we'll need access for further review.

 

Evidence/Logs

 

When troubleshooting this particular issue, we need to review the clBackupParent.log on the MA to pinpoint root cause. Most likely, we'll observe log entries as per below upon snapshot creation:

 

/opt/commvault/Log_Files/clBackupParent.log (default log location unless specified otherwise)

 

4151 1037 12/15 09:57:04 <JOB_ID> CvProcess::system() - lvcreate --snapshot --name DDBSnap_<SNAPSHOT_ID> <DDB_DEVICE_PATH>

4151 1037 12/15 09:57:04 <JOB_ID> CvProcess::system() - Command completed with rc=5

4151 1037 12/15 09:57:04 <JOB_ID> DDBBackupManager::SnapVolumes(387) - SnapAndMount of [/dev/mapper/nvmevg-ddb] failed. Error [Failed to create snapshot of device <DDB_DEVICE_PATH> with name DDBSnap_<SNAPSHOT_ID>:   Cannot create new thin volume, free space in thin pool nvmevg/ddb_pool reached threshold.].

 

Resolution

 

Proceed with Option #1 below. If this solely does not resolve the problem, proceed with Option #2:

 

1) Run fstrim /ws/ddb (or where you have your DDB volume mounted) command on this MA. This command is used to discard unused blocks by the file system which is effective for SSDs and thin-provisioned storage in particular. By performing this action, we'll aim to clean up space of which will put us under the threshold mentioned earlier.

 

A comparison can be made both prior and after running this command by reviewing the lvs command output for the volume group and its Data% value/column. If the Data% is < 50% (explained in Option #2), attempt another DDB Backup here and review your outcome. If not, proceed with Option #2 below.

 

2) Edit the /etc/lvm/lvm.conf file (via vim, nano, etc) and modify the thin_pool_autoextend_threshold value from [50] (default value) to [80] (or 90 in some cases, 100 is not recommended as the pool will be completely full):

 

Before

 

thin_pool_autoextend_threshold = 50

 

After

 

thin_pool_autoextend_threshold = 80

 

By definition, we'll extend the thin pool automatically now once it reaches 80% capacity by the value of thin_pool_autoextend_percent instead of at 50% capacity. With this performed, re-run the DDB Backup operation and report your findings.

 

If the issue persists please seek Commvault Support and raise a ticket for a deeper investigation. 

Userlevel 4
Badge +11

This is pretty accurate . The same solution i used to repair my MA DDB - I have also found more issues with this happening with release 7(centos/Redhat) than previous versions. It seems that the stop/start function that is happening during patches is not stopping the DDB properly. I have started to put a small script into my linux MA that runs 20 min before the updates on tuesdays. So far its been pretty good.

Userlevel 4
Badge +11

This is pretty accurate . The same solution i used to repair my MA DDB - I have also found more issues with this happening with release 7(centos/Redhat) than previous versions. It seems that the stop/start function that is happening during patches is not stopping the DDB properly. I have started to put a small script into my linux MA that runs 20 min before the updates on tuesdays. So far its been pretty good.

I meant to say that the  fact these are not stopping properly it was causing some break/fit/corrupt issues with the DDB keeping it from being backed up. I left that part out.

Reply