Hi @Ken_H
Do keep an eye on the utilization of the disk. Once it matches the other disks, it should be Online.
Also, you can check the heals pending for the disk by running gluster v heal $(gluster v list) info.
Regards,
I’ve seen replacement drives appear as Online even while they were 1.3TB smaller than other drives on the server so matching the space used to other drives doesn’t seem to be a good way to estimate when the rebuild will be complete.
Running the “gluster v heal HyperScale info” command lists every single block (segment? file?) that needs to be healed and on a newly replaced drive, but these count into the hundreds of thousands. I tried to filter them out using:
gluster v heal HyperScale info | grep -v gfid | grep -v Folder_
And this gives:
Brick inf-srvp110sds.apacorp.net:/ws/disk1/ws_brick
Status: Connected
Number of entries: 0
Brick inf-srvp111sds.apacorp.net:/ws/disk1/ws_brick
Status: Connected
Number of entries: 0
Brick inf-srvp112sds.apacorp.net:/ws/disk1/ws_brick
Status: Connected
Number of entries: 0
Brick inf-srvp110sds.apacorp.net:/ws/disk2/ws_brick
Status: Connected
Number of entries: 1
Brick inf-srvp111sds.apacorp.net:/ws/disk2/ws_brick
Status: Connected
Number of entries: 371113
Brick inf-srvp112sds.apacorp.net:/ws/disk2/ws_brick
Status: Connected
Number of entries: 357442
Unfortunately, the output never completed even after being left to run for 20 hours. Long story short, there doesn’t appear to be a way to monitor the heal process.
Ken
Hello @Ken_H,
How you solved the situation?
Hello @Ken_H,
How you solved the situation?
There does not appear to be an answer to this problem. If you have multiple drives each reporting multiple bad sectors the only option is to replace one of the drives then wait days for it to rebuild. Depending on when the rebuild completes, it could be quite a while before you notice and get the next failing drive replaced.
@Pavan Bedadala can you shine your light on this post?