Skip to main content
Question

Chunk errors on the Restoration

  • August 1, 2025
  • 1 reply
  • 98 views

We are frequently encountering a problem where data chunks are missing, and this is impacting our ability to restore multiple VMs. Initially, we suspected an issue with the cloud storage provider and requested the support team to investigate. However, upon further analysis, we discovered that a delete command was issued by Commvault, which led to the deletion of the data from cloud storage.

As a result, multiple storage policies have been affected, and we are currently unable to restore the data for several virtual machines.

 

Analysis:

 

According to DataCore logs:

  • The SFILE delete request was received at 08:08:49
  • The Compact File delete request was received at 08:07:55

Commvault is expected to send a delete request for the Compact File only after a successful remote copy of the Compact File to recreate the SFILE. If this process is followed correctly, the SFILE should have already been deleted before the Compact File is removed.

However, this sequence appears to be reversed, and DataCore has not shown two delete requests for the SFILE. This raises concerns about the deletion order and whether Commvault is handling it correctly. The issue remains unresolved.

 

Questions: 

 

 

We are seeking clarification on how Copy-by-Range and Remote Copy requests are processed.

Based on documentation and feedback from trainers and support engineers, these steps are critical for Space Reclamation. Specifically:

  • During these copy processes, is the data temporarily downloaded and uploaded to the cloud?
  • Is this the reason a Media Agent is required on the cloud?

While this question may not be directly related to the current issue, we are trying to better understand the underlying process.

1 reply

Forum|alt.badge.img+9
  • Vaulter
  • August 21, 2025

Hi ​@Tiruveedi Venkata Naresh ,

 

Below is the sequence of events during the space reclamation process

-Create compact2 file on storage, and we copy valid ranges/data from source SFILE to SFile.compac2
-Deletes the original SFILE
-We issue remote file copy from compact2 file to original file
-Delete the compact2 file

During this DDB space reclamation process, our Commvault system encountered a curl error stating "No response from storage in 30 seconds" while attempting to delete a file. Fortunately, the operation was retried and completed successfully. However, the load balancer processed the initial request at a later time, which resulted in the deletion of the SFILE. 

To address this, we are releasing a fix in versions 11.32.114 and 11.36.71. This fix will ensure that we do not retry delete requests for defrag operations in the event of a timeout.

I also want to highlight that for customers using public cloud infrastructure, having a Media Agent present in that cloud will accelerate the operation compared to those using an on-premises Media Agent.
 

Regards,

Wasim