Solved

Intellisnap Backup Copy kills Virtual Disk Service on Server 2016 Media Agent


Userlevel 2
Badge +4

Hi

Been plagued with this problem for a while. Support has not been able to crack it yet either. I have 4 Media Agents and 2 are using CBT and crash consistent backup options for the Intellisnap and work fine. The other 2 Media agents are also using CBT but using Application Consistent (quiesced backup) backup options for Intellisnap and these Media Agents will sometimes freeze up the Virtual Disk Service and the backupcopy fails for the remainder of the backup copies. I cannot open the process manager on the MA’s when this happens and end up have to reboot the MA to get things back on track. Not a lot of info but chucking this one out there to see if anyone had a similar experience.

 

icon

Best answer by s3narasi 12 March 2021, 06:48

View original

39 replies

Userlevel 2
Badge +4

Yeah its tough to narrow down and as much as I like CV Support the Tech’s chaulked it up to Windows or the SAN and tapped out. Also WRT to the version my problem was worse on V11. Right now I only see the issue 1 a week or every 2 weeks versus walking up to a sea of failed backup copes so I’m living with it at the moment. If you figure it out let me know.

 

Userlevel 2
Badge +4

It was some kind of rhetorical question ;-)

Ah, that is different. Hitachi has not that much options ;-(

I am on the same level by the way ..11.22.27 (coincidence?)
 

 

I did some cleanup of hidden disk and reboot the MA several times no luck there.

Although the VDS notices were lowered but not gone.

Seems that I have to address this to Hitachi/CV support.

Userlevel 2
Badge +4

@HenkR lets not get off track here. My hope as a CV customer is to be able to manage everything as a Service so I don’t have to spend so much time looking after it. 

Here is the screen shot. Clear out all the options. My snap / backup copes went from 30 minutes each on an incremental to 5 minutes each:. Also I set the proxy at the subclient level as well to manually balance snapshot work load:

 

 

Userlevel 2
Badge +4

@dude : true CV Cloud can be a target library (but source?)

@Neil Cooper : where do you migrate your source data? (from dell to »?)

What do you mean by lowering the detail required in the subclient ??

 

Badge +15

I wonder why you think CV Cloud would solve the problem. It still requires a media agent as far as I know.

Userlevel 2
Badge +4

@HenkR 

Hi

I ended up updating the CommServe and all the MA’s too 11.22.27 . Also I lowered the detail required for the Intellisnap under the subclient properties. As well I only have 2 of 4 MA’s with the issue so I reboot them at least once a week. This made my situation with VMWare and Dell Depreciated SAN 90% better but still experience the VDS on the Windows side freezing and the CommVault Services freezing up as well. I have been through a few calls with CV support but the GX tail revealed nothing. So until we move to CV cloud it will be my problem. Also someone did mention a Microsoft script that you can download and cron daily to clean up and left over SAN / registry connections. 

Neil

 

Userlevel 2
Badge +4

Is this topic solved or in progress? Does not seems to be.

Anyway I have a same situation with HyperV mounts from a Hitachi SAN system with Intellisnap. Were after a while the snappool filled up. After releasing snapshots the backup copy cannot not proceed  (OS mount failed and VDS disk service errors on MA server (W2019). HORCM version is up2date.

VDS error is 1@02000018 in eventvwr

Manually mount does not work either but not for every snapshot. This is very weird. Same errors ID's here and deleting all incrementals and start new full (primairy snapshots) does not help either.
CV log : Error 60509 unable to find OS device corresponding to snap/clones
Failed to map snaps status 60500 : Snap Engine Error 0xEC02EC54 … MM.60500.
Failed to mount  volume unable to find OS devices etc.

Any idea here?

Userlevel 7
Badge +23

No problem, @Neil Cooper !!!  Thanks!!

Userlevel 2
Badge +4

Sorry @Mike Struening 

 

I was late marking correct answer but I did it tonight!

 

Cheers

 

Neil

Userlevel 7
Badge +23

@Neil Cooper , are you now resolved on this issue as per your most recent update?  Want to be sure we don’t have any dangling threads :sunglasses:

Userlevel 7
Badge +23

@Neil Cooper , following up on this thread.  I can see in your case notes that you had some green backup activity and were investigating an Unkown Device issue in Device Manager, also that removing detailed snap option has significantly cut down our snap and backup copy down to literally minutes.

Is this issue now resolved?  If so, I’d like to srt one of the many helpful replies as the Best Answer.

Thanks!!

Userlevel 7
Badge +23

That’s awesome, @Neil Cooper !  Once you feel safe in the issue being resolved, feel free to mark one of the many helpful replies as the Best Answer!

Userlevel 2
Badge +4

Sorry for the delay. Intellisnap has been working great since changing a few options on the Datastore backups subclient and manually setting the proxy. I still think its an issue with throughput as the scheduling has to be perfect with no overlap for the MA VDS will freeze up. Also I run the Windows snap clean up that @s3narasi recommended on the MA’s themselves as a scheduled task. I do still have to update the CV environment to the last HPK as well so maybe that will clear up any remaining issues. Thanks for everyone’s help in getting us on the correct track.

Cheers

Neil

Userlevel 2
Badge +4

@dude Just providing an update with some thoughts / info. I did see all your replies thank you. I’m not going to install software on the MA’s at this point as I’m not sure its required. I do believe we are following best practices up to this point. I will monitor the jobs all week and report in.

Thanks

 

Badge +15

I`m quite unsure as to what your thought process is here or even at what your questions are. Previously shared some links that points to best practices as well as a software for the Media Agents, did you have a chance to look at that?

Userlevel 2
Badge +4

Update

 

So I have gone back to manually setting the proxy at the sub client level equally among the datastore IntelliSnap backups. CV has suggested we remove the option on the snapshot for collect file details for snapshot copy. This has changed our snapshot timings from close to an hour (incremental) down to merely minutes. I’m not running any Compellent software on the MA’s but I did run the Microsoft tool to remove old Compellent registry entries. We are running CV V11 SP 16 and still using the Dell Compellent Depreciated option for the SAN snap vendor and the datastore backups. Maybe this is part of the issue?? Also the MA’s and the CommServe are behind in HPK so that might also solve some issues. It seems like a throughput issues on 2 of the 4 MA’s because the only issue I experienced all weekend was when 2 schedules overlapped. I spread them out by an hour and all green again last night for the snap and backup copies. At most we should have 3 datastores (20 VM’s each) running. Please see greyed out options for Compellent

 

 

Userlevel 2
Badge +4

@s3narasi 

 

Userlevel 2
Badge +4

@s3narasi Output from the DevNodeClean

 

Badge +15

Hi, to me the fact that you see multiple LUNs and dead paths has a lot to do with the MPIO software and the Dell Compellent Software I mentioned above. Check it out and let us know when you have a chance. Enjoy your weekend. 

Userlevel 2
Badge +4

Hi

I have read all the replies. Thank you. I was in the DC yestarday and did not reply but will do my best once I have had a chance to try some more things with Ben from CV support. Currently spreading out the schedules has helped. I’m seeing error about Mutiple LUNS causing issues during the snapshot mount on the Compellent. The cleanup tool mentioned that needs to be installed on the MA might be important here as we are still on V11 SP 16. I can see the dead LUNS in VMWare and when I do a rescan of storage they go away:

OS mount failed : [VMWare Mount snapshot failed [0xEC02ECC3:{VMwareSnapOSUtil::MountSnap(1841)/MM.60611-Error mounting the snapshot LUNs because there are multiple copies present.}] (MM.60611)]

Badge +15

A few other things.

Dell Compellent is deprecated from the Commvault software in V11 SP12. Source: https://documentation.commvault.com/commvault/v11_sp16/article?p=33107.htm

It does require Data Instant Replay licensing as stated above: Source: https://documentation.commvault.com/commvault/v11_sp16/article?p=33107.htm

Commvault Docs says that it should not require any additional Dell Compellent software as per the System Requirements 

This is an old document from Dell on their - CommVault Simpana 10 Best Practices for the Dell Compellent Storage Center and though the articles in my previous post  do mention the requirement Compellent Replay Manager Service on Hyper-V  - it raises the question as to whether or not you have Dell Compellent Software Replay or any other software that may be interfering with the way Intellisnap works when it takes the snaps and mounts on your Media Agent during an snapmount.

In any case, if you do have it installed, check out the version as some older versions may not support integrate well (or integrate at all) with Commvault. From what I could find, it seems version 8.0.1 is the latest.

 

If you do have replay manager, make sure to be using the latest version as it fixes issues with the VSS Provider, if you have the latest version and still does not work, try removing it completely and retry the operation.

 

 

Badge

@Neil Cooper We are talking Vmware backups I think and when you are seeing the VDS hang when it is loaded. how many data stores are getting presented here when we have heavy load. 

is there any dell compellent tools installed in the Proxy server and can we confirm is there a lot of left over phantom devices or hidden devices to find them use the below link

https://support.microsoft.com/en-us/topic/device-manager-does-not-display-devices-that-are-not-connected-e7148232-40ae-bb07-0077-88f2e859b53f

 

or you can use the below tool to show and cleanup the phantom devices devnodeclean /n

https://www.microsoft.com/en-us/download/details.aspx?id=42286 

Badge +15

@Neil Cooper 

As I navigate down the internet researching this issue I sure see some other very similar cases. So what I`m about to share it may (not) be totally connected to your issue, so take all of this with some grain of salt. 

From reading this document it is my understanding that Dell requires its own software integration for application consistent snapshot backups. 

Page 52

“If additional server protection is desired by capturing application-consistent VSS-integrated Replays of Hyper-V guest VMs, please refer to the Dell Compellent Replay Manager 6 Users Guide. Dell Compellent Replay Manager 6 is able to leverage Microsoft VSS to take application-consistent (IO is paused) Replays of Hyper-V guests, Exchange servers, and SQL servers.” 

And then I found this thread that refers to Dell Compellent, that led me to this blog post that says;

Page 3 and 4

“Backup environment set up we need to configure the Compellent SAN LUNs and the Replay Manager software to make sure the hardware VSS provider is used and the transportable snapshots can be presented to the Off host backup proxy server.”

“You need to install the Compellent Replay Manager Service on - - - - - - backup proxy server. Do note that you need a license for this software, which is needed later on when you configure this service on the hosts to interact with the Compellent SAN.”

 

So again, Im not an expert in Dell Compellent, but after reading some of these docs I do wonder if you have the software configured with the VSS on your media agents. Would you be able to review the links and confirm here?

 

Thank you

Userlevel 2
Badge +4

So setting the proxies at the instance level did not work on the second night. The troubled MA was allocated most of the load and froze up the VDS and the backup copies failed. Back to square one.

Thanks

Userlevel 7
Badge +23

That Ben is a good egg!  Keep us posted on your results :-)

Reply