Skip to main content
Question

Hyperscale Appliance disk data rebuild after failure

  • April 21, 2024
  • 6 replies
  • 219 views

Forum|alt.badge.img+1

HI All,

 

i just want to know the process how the data get reconstructed on the disk after the replacement. I’m aware the hyperscale uses Erasure coding in back end but not sure if we have any document defining the process.  

6 replies

Emils
Vaulter
Forum|alt.badge.img+13
  • Vaulter
  • April 23, 2024

Depending on whether we’re using version Hyperscale (HS1.5) or Hyperscale X (HSX), there are slight differences in how they operate.

In Hyperscale 1.5, it utilizes a Gluster file system that keeps track of all the parts within a disk that have failed. Only when the disk is replaced does it rebuild these parts on the new disk using available parts from other drives (using erasure coding).

In Hyperscale X, the process is slightly different. As soon as the system detects a failure, a Storage Pool Migration occurs immediately (regardless of whether the disk is replaced or not). This process rebuilds the parts that were on the failed disk and places them in alternate locations within the cluster to maintain resiliency. When the disk is replaced, a rebalance process starts to fill up the newly replaced disk

 


Forum|alt.badge.img+1
  • Author
  • Vaulter
  • April 23, 2024

HI Emils

Thank you for the response. I have two questions from above: 

  1. the rebuilt data has been kept on the same logical volume from which the disk failed?
  2. What is the rebalance process?how exactly it works?
  3. Will i be able to perform backup during this process?

 


Emils
Vaulter
Forum|alt.badge.img+13
  • Vaulter
  • April 23, 2024
  1. The rebuilt data is recalculated (Erasure Coding). Recommend reading how erasure code works.
  2. Rebalance occurs hourly on the cluster and checks on disk utilization and will try to rebalance based on disk usage.
  3. Backups will not be affected during this process.
    1. Review HSX resiliency for more details.

Forum|alt.badge.img+1
  • Bit
  • November 8, 2025

@Emils For hyperscaleX,  if read request to failed disk, and SPM process not yet completed, the read will fail or not? HSX will not recaculate the requested block by EC code in memory in real time, right? I encountered DDB verification failure once a disk failed, but the next day DDB verificadtion can succeed, so I suspect SPM is reconstruct it, do you have any officially document this? If so, do you do know how to monitor this SPM process? I don’t want to DDB verfication fail again which will cause a full backup.


Forum|alt.badge.img+2
  • Bit
  • November 10, 2025

Hi ​@Leo2025

In Commvault docs it’s usually described under HyperScale X / Storage Pool / Disk Recovery / Self-healing / SPM sections — it’s not always written in the exact words “your DDB verification may fail if you run it while a disk is rebuilding”, but the mechanism they describe (SPM automatically rebuilding/rebalancing after a disk failure) is what you just witnessed.

When opening a support case, they’ll usually point you to:

HyperScale X Administration / Managing disks in a storage pool

Disk failure and automatic recovery / SPM process
…that kind of chapters.

---

"How to monitor SPM progress?”

That’s the useful part 👍. You’ve got a few options:

1. Command Center / Storage / HyperScale (or Storage Pool view): check that the pool is Healthy and that no disk is in a degraded/rebuild state. Don’t launch DDB verification while it’s “rebuilding” or “degraded”.


2. Alerts / Events in CommCell: HSX raises events when a disk goes bad and when recovery completes — subscribe to those so you know when it’s safe to run verification.


3. Node-level logs: on the HSX node, spm.log (and sometimes ssm.log) will show the progress of the rebuild/reallocation. That’s what support looks at to tell you “SPM is still reconstructing”.


4. Rule of thumb: after a disk replacement/failure, wait until the storage pool is back to green before scheduling DDB verification jobs.

If you want to be extra safe, you can even reschedule DDB verification so it doesn’t run right in the middle of a rebuild — because like you said, a failed verification can lead to unwanted behavior (reverify / DDB goes read-only / risk of rebaseline).


Forum|alt.badge.img+1
  • Bit
  • November 11, 2025

Thanks, I got the following steps is useful:

  1. showallspmid command to get the SPM in pending status​​​​​​​
  2. spmstatus to check the status of that SPM status, there is a percentage be provided

With these steps if no pending SPM, then I think we can do normal backup etc, if pending we need to wait, or log a ticket to CV to fix if it was in pending status for a long time.