Question

Hyperscale x handling jobs while node is down

  • 20 January 2024
  • 1 reply
  • 51 views

Userlevel 1
Badge +13

Team , how the hypersalce x , media agent handle the jobs when a failure of one job occurs ,

 

i have performed a testing where a job was running with media agent 2 of hyperscal node 2 and i shutdown the same node , 

1> the running job is on running mode - and no progress 

 

2> i started new job and it completed successfully 

started the node 2 and still no progress on that job and it failed .why was it not assigned to any other media agents - and why the job is handled only by one media agent - 

i can see restore jobs used all 3 media agent but backup is by one , do we have control on this .

 

3> after the nodes came up , there was a restore job happned from node 2 to node 2 - may i know what basically getting restored ?


1 reply

Userlevel 3
Badge +5

Hi @Ajal,

 

Thanks for raising this matter. Can you please confirm whether or not the MediaAgent in question was shutdown gracefully as per the procedure outlined in the following article? Stopping and Starting a HyperScale X Appliance Node

 

It is not recommended to shutdown these MA(s) ungracefully for the sake of failover testing as it may lead to data integrity issues.

 

Regarding your queries:

 

  1. Yes, a backup operation using an MA which experiences an outage will stall. The initial attempt will eventually stall and thus lead to a re-attempt where the cluster will remain resilient, via another MA
  2. You started a new job entirely, thus a new attempt, and the job completed successfully as expected. You then specified the downed MA for the next job which reported the failure, again as expected. To ensure we utilise a separate resource for the storage policy copy, review the following documentation article to confirm whether the respective setting has been configured: Copy Properties - Data Path Configuration
  3. Restore operation may have been spawned as part of the automated DDB Reconstruction (indicating DDB corruption if you performed an ungraceful shutdown with a DDB table open) or potentially for an index restore. We’d need clarification on what operation exactly was launched to investigate further.

 

With this in mind, we do not recommend you perform such tests again. Feel free to speak with your Accounts team and/or Customer Support regarding any queries/concerns pertaining to Hyperscale X resiliency.

Reply