Solved

Hyperscale X Ref. Architecture - Failed to validate cluster


Userlevel 4
Badge +11

So this is deployment 3 of 3 this week. And the first two have gone flawless. Pointed out some notes and documentation errors to the Commvault staff as well. However, this 3rd one is giving me grief. I am receiving a Failed to validate cluster please check the cvmanager.log for details.

Ok, So - Note that all three nodes are green with Checks for status. So those are fine

The logs - this is where it gets weird. I have checked all three nodes /opt/commvault/logs/cvmanager.log - is blank. Theres nothing is there. I even tried to tailf the log file during the install and it just sits . So there is nothign being written to the logs.

Now what? Well i burned like 4 hours trying to solve this so i just blew them away- reimaged all 3 nodes and started over. Here I am- Same things with fresh installs. 

I checked NTP and date/time is all gtg as well. 

What else should I be checking? Network is all working- I can hit all IP’s and ssh , iscsi etc is GTG. 

 

Ready? Go!

 

icon

Best answer by Mike Struening RETIRED 25 October 2022, 22:27

View original

11 replies

Userlevel 4
Badge +11
Forgot to add this image as well. 

 

Userlevel 7
Badge +17

What type of network config are you using for data protection and storage pool? Bonds, vlan, combo of both?

Is the config based on DNS or IP?

What protocol are you using for data protection and storage pool? IPv4 or IPv6 for both or for instance IPv4 for storage pool and IPv6 for data protection?

Userlevel 4
Badge +11

Bonds- the same as the other two deployments

IP and DNS based config. all Static

IPv4 for both. storage pool is non-routed ipv4

Userlevel 4
Badge +11

Also , I just tried running the hsxsetup from a different node- Same result.

Userlevel 7
Badge +17

Mmz ok, that's odd, should work.

I would personally contact support to figure out what's happening in the deployment. Had similar issues with hyperscale 1.5 which needed manual intervention by support. Not saying it's the same, but it can get complicated to troubleshoot, especially from a remote perspective.

Userlevel 3
Badge +5

@Matthew M. Magbee As we discussed over the email thread, support escalation is the best way to resolve your install issue. I have given the heads up the engineering. 

Userlevel 7
Badge +23

Hey @Matthew M. Magbee 😎

Can you share the case number once created so I can track it?

Thanks!

Userlevel 4
Badge +11

I have not created it yet- ruling all items out- I have replaced cables and moved Ip’s - and moved ports on the 9k . Validating cluster now.

Userlevel 4
Badge +11

Still didnt work- here is the ticket info 221018-635 |

Userlevel 4
Badge +11

Update: When working with commvault we dug through logs and found this little gem

DEFAULT - The number of SAS drives is not same across nodes. One or more drives could be unmounted or have a hardware issue
 

I checked and sure enough- one of the nodes is missing a drive :o

Userlevel 7
Badge +23

Appreciate the case share, @Matthew M. Magbee !

Sharing the resolution in the case closure for posterity:

Checked and reseated drive - no change
Drive replaced by vendor and node re-imaged
Everything fine

Reply