Intellisnap Backup Copy kills Virtual Disk Service on Server 2016 Media Agent
Hi
Been plagued with this problem for a while. Support has not been able to crack it yet either. I have 4 Media Agents and 2 are using CBT and crash consistent backup options for the Intellisnap and work fine. The other 2 Media agents are also using CBT but using Application Consistent (quiesced backup) backup options for Intellisnap and these Media Agents will sometimes freeze up the Virtual Disk Service and the backupcopy fails for the remainder of the backup copies. I cannot open the process manager on the MA’s when this happens and end up have to reboot the MA to get things back on track. Not a lot of info but chucking this one out there to see if anyone had a similar experience.
#1 - Create a new subclient or exclude all vms except one for testing and make sure you have a VMtools updated.
#2 - Increase the Debug for (vsbkp and VixDiskLib) on the media agent side to maybe 3 / File Size to 10MB and File Versions to 5) - You can revert this back to Default once the test is complete
#3 - Start a backup for the one VM that you have seen failing before.
#4- Use GxTail to open both logs and filter by “Commvault Failures and Successe”
When you are using Application Consistent I`d make sure that my VMtools is stable and not a problematic one like 11269. Post the results here.
Hi Neil,
Have you checked that Automount is disabled and the SAN policy is OfflineShared on the affected Media Agents? Is there any AV on the Media Agents that could be scanning the attached disks or interfering with the CV Processes?
I’d also check the vsbkp,VixDiskLib and Event logs whilst the Job is running or before the issues to get an idea of what is happening.
Best Regards,
Michael
Appreciate the response. All MAs are over specked physical machines. No VM’s in the CV hardware.
Recommendations replies:
It could be a vmware tools issue on some of the machines as some are out of date. Although this issue does not happen all the time.
Will try this. The CV Engineer was on the phone and set this up. Will review in the am.
The snapshot of the Datastore finishes. When the backup copy runs for a random datastore on the schedule sometimes the VM’s just get stuck in waiting and none of the backup copies will run after that. I have to properly reboot the MA in order to get the backup copes to run to finish off the Intellisnap work flow.
Going through the logs in the am (if we have failures) and I will report back.
Thank you for your reply.
Cheers
Neil
@MichaelCapon automount disabled and SAN Policy = Offline Shared
No AV
Tracking the logs.
Cheers
Neil
@Neil Cooper What storage array are you using? Can you share the Windows Event Logs from when the error happens?
@dude Dell Compellent SAN.
@s3narasi
Hi
Been plagued with this problem for a while. Support has not been able to crack it yet either. I have 4 Media Agents and 2 are using CBT and crash consistent backup options for the Intellisnap and work fine. The other 2 Media agents are also using CBT but using Application Consistent (quiesced backup) backup options for Intellisnap and these Media Agents will sometimes freeze up the Virtual Disk Service and the backupcopy fails for the remainder of the backup copies. I cannot open the process manager on the MA’s when this happens and end up have to reboot the MA to get things back on track. Not a lot of info but chucking this one out there to see if anyone had a similar experience.
have you tried using a separate proxy ( even just as a test ) for the backup copy ? Currently i use separate proxy for snap and backup copy to avoid any issues with application consistent
That Ben is a good egg! Keep us posted on your results :-)
So setting the proxies at the instance level did not work on the second night. The troubled MA was allocated most of the load and froze up the VDS and the backup copies failed. Back to square one.
Thanks
Hi
I have read all the replies. Thank you. I was in the DC yestarday and did not reply but will do my best once I have had a chance to try some more things with Ben from CV support. Currently spreading out the schedules has helped. I’m seeing error about Mutiple LUNS causing issues during the snapshot mount on the Compellent. The cleanup tool mentioned that needs to be installed on the MA might be important here as we are still on V11 SP 16. I can see the dead LUNS in VMWare and when I do a rescan of storage they go away:
OS mount failed : VMWare Mount snapshot failed e0xEC02ECC3:{VMwareSnapOSUtil::MountSnap(1841)/MM.60611-Error mounting the snapshot LUNs because there are multiple copies present.}] (MM.60611)]
Hey @Neil Cooper ! I see you have a case opened for this, so I’ll get some top brains on it for you. I also want to make sure we circle back and share the solution for posterity.
If anyone in the community has ideas, please do share for Neil!