Question

Setting MediaAgents to Maintenance caused VM Backup to fail

  • 12 December 2023
  • 4 replies
  • 62 views

Userlevel 1
Badge +7

Hi,
today I observed, that putting all of my MediaAgents to Maintenance for specific site, caused scheduled backups to fail with: no access nodes online.
I use the MediaAgents as AccessNodes for the Vmware installation in that site, and expected all jobs to go to waiting or queued state instead of failing.

Is this the expected behavior? 

I did not deactivate backup activity on any client resource.

Currently running on V11.32.28

rgds
Klaus


4 replies

Userlevel 6
Badge +15

@johanningk 

I’d expect existing jobs to go waiting or pending, I’d expect new jobs to fail to start. 

BOL states: 
 

You can put a MediaAgent into maintenance mode so that it does not run backup, restore, or auxiliary copy operations.
 

When you put a MediaAgent into maintenance mode, if a job is running on the MediaAgent, the job goes into a waiting state until another MediaAgent runs it.
 

https://documentation.commvault.com/2023e/essential/putting_mediaagent_into_maintenance_mode.html


Do you recall the behaviour being different prior to 11.32?


Regards,

Chris



 

Userlevel 1
Badge +7

Hi @Chris Hollis,

to my current understanding, only disabling backup activity on either 

  • CommCell
  • Client Computer Group
  • Client
  • Agent
  • Instance
  • Subclient

will have the effect of failing backups. (of course depending on Job Controller settings)
If MediaAgents are set to Maintenance, Tape/Cloud/Disk libraries are offline or the max. Number of streams is reached for a MediaAgent, Client or StoragePolicyCopy, the jobs are set to queued or waiting state.

Now I set the MediaAgent to maintenance, which to my understanding should fill the job queue and not fail new jobs to start, while backup activety is left active for all client resources, I observe that VM Jobs are failing to start.

If you’d say, putting the MediaAgent to offline state, that might be a possible / reaction, but setting it to Maintenance should have minimum impact to normal Job operations

I’ll try to verify this with a V11.28 Commcell, not sure how fast I can do this ...

Regards,

Klaus

Userlevel 6
Badge +15

@johanningk 

Did some testing, in FR28 after putting the MA into maintenance mode, scheduled jobs to run and got this result (File System backup)
 


Same test in FR32:

 


So looks like FR32 now shoots us a JPR about the maintenance mode being enabled and how to rectify. 

I did another test for VM backup where the VSA Agent is also the MediaAgent

 

It’s definitely not a license issue as disabling maintenance mode allows it to run…
 

Final test is when the VSA backup is using a different media agent:

 


So in conclusion, I think the behaviour you are seeing where the job fails is due to the VSA/MA being on the same machine… 

One of the initial checks done by the backup must be to validate if the access node is online and if it’s not, it fails the job there and then rather than sets it to ‘waiting’. 

I can’t say this is not expected as it’s technically correct… however if you wanted to alter this behaviour we have to raise a support case and escalate it officially for development feedback/suggestions.

Let me know.

Regards,

Chris 

Userlevel 3
Badge +6

Hello @johanningk,

 

When we do intrusive Commcell global maintenance we enable the option “Queue Scheduled Jobs” under control panel and Job Management.

 

This ensures we do not miss scheduled jobs, but can do the maintenance we need and then resume operations with hardly any impact to recovery point creation.

 

Hope this helps.


Regards,

Mike

Reply