Solved

Intellisnap Backup Copy kills Virtual Disk Service on Server 2016 Media Agent



Show first post

39 replies

Userlevel 2
Badge +4

Morning

Setting the proxy at the client level versus the subclient level seems to have load balanced the backup copy much better. A “sea of green” for the Intellisnaps last night. Its only one night but I will monitor and report back. Ben from CV support is really digging into this issue for me. Went through a lot yesterday.

More to follow….

Userlevel 2
Badge +4

@dude 

Log Name:      System
Source:        Virtual Disk Service
Date:          3/9/2021 2:37:38 PM
Event ID:      1
Task Category: None
Level:         Error
Keywords:      Classic
User:          N/A
Computer:      kil-cvlt-8
Description:
Unexpected failure. Error code: 2@02000018
Event Xml:
<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
  <System>
    <Provider Name="Virtual Disk Service" />
    <EventID Qualifiers="49664">1</EventID>
    <Level>2</Level>
    <Task>0</Task>
    <Keywords>0x80000000000000</Keywords>
    <TimeCreated SystemTime="2021-03-09T18:37:38.525490600Z" />
    <EventRecordID>730396</EventRecordID>
    <Channel>System</Channel>
    <Computer></Computer>
    <Security />
  </System>
  <EventData>
    <Data>2@02000018</Data>
  </EventData>
</Event>

Userlevel 2
Badge +4

@Matthew M. Magbee The proxy has hard set at the subclient level. Removed and will let it balance the load tonight and see how it goes.

Thank you

Neil

Userlevel 4
Badge +11

Hi

Been plagued with this problem for a while. Support has not been able to crack it yet either. I have 4 Media Agents and 2 are using CBT and crash consistent backup options for the Intellisnap and work fine. The other 2 Media agents are also using CBT but using Application Consistent (quiesced backup) backup options for Intellisnap and these Media Agents will sometimes freeze up the Virtual Disk Service and the backupcopy fails for the remainder of the backup copies. I cannot open the process manager on the MA’s when this happens and end up have to reboot the MA to get things back on track. Not a lot of info but chucking this one out there to see if anyone had a similar experience.

 

have you tried using a separate proxy ( even just as a test ) for the backup copy ? Currently i use separate proxy for snap and backup copy to avoid any issues with application consistent 

 

https://documentation.commvault.com/commvault/v11/article?p=62414.htm

Badge +15

@Neil Cooper Can you please share what the error says. Open of of the error messages and share with us the details on that Error Log. Event ID, Description etc. Thanks

Userlevel 2
Badge +4

@dude Dell Compellent SAN.

 

Badge +15

@Neil Cooper What storage array are you using? Can you share the Windows Event Logs from when the error happens? 

Userlevel 2
Badge +4

@MichaelCapon automount disabled and SAN Policy = Offline Shared

 

No AV

 

Tracking the logs.

 

Cheers

 

Neil

Userlevel 2
Badge +4

Appreciate the response. All MAs are over specked physical machines. No VM’s in the CV hardware. 

Recommendations replies:

  1. It could be a vmware tools issue on some of the machines as some are out of date. Although this issue does not happen all the time.
  2. Will try this. The CV Engineer was on the phone and set this up. Will review in the am.
  1. The snapshot of the Datastore finishes. When the backup copy runs for a random datastore on the schedule sometimes the VM’s just get stuck in waiting and none of the backup copies will run after that. I have to properly reboot the MA in order to get the backup copes to run to finish off the Intellisnap work flow.
  2. Going through the logs in the am (if we have failures) and I will report back. 

Thank you for your reply.

Cheers

Neil

 

Userlevel 6
Badge +14

Hi Neil,

Have you checked that Automount is disabled and the SAN policy is OfflineShared on the affected Media Agents?
Is there any AV on the Media Agents that could be scanning the attached disks or interfering with the CV Processes?

I’d also check the vsbkp,VixDiskLib and Event logs whilst the Job is running or before the issues to get an idea of what is happening.

Best Regards,

Michael

Badge +15

Hi @Neil Cooper I have a few questions. Are these media agents physical or virtual? Would you be able to get one VM using the stable vmtools version (please make sure you are not using version 11269 issue reported here (https://docs.vmware.com/en/VMware-Tools/11.0/rn/VMware-Tools-1105-Release-Notes.html

My recommendation:

#1 - Create a new subclient or exclude all vms except one for testing and make sure you have a VMtools updated.

#2 - Increase the Debug for (vsbkp and VixDiskLib) on the media agent side to maybe 3 / File Size to 10MB and File Versions to 5) - You can revert this back to Default once the test is complete

#3 - Start a backup for the one VM that you have seen failing before. 

#4- Use GxTail to open both logs and filter by “Commvault Failures and Successe” 

 

When you are using Application Consistent I`d make sure that my VMtools is stable and not a problematic one like 11269. Post the results here.

Userlevel 7
Badge +23

Appreciate that!  I already reached out to the Leadership team that owns your case and said we need a top mind here :nerd:

Userlevel 2
Badge +4

Thanks. CV is top notch WRT support. I’m trying again to see if they can assist but in the CV logs there is not much to go on which is why I’m reaching out here as well.

 

Cheers

 

 

Userlevel 7
Badge +23

Hey @Neil Cooper !  I see you have a case opened for this, so I’ll get some top brains on it for you.  I also want to make sure we circle back and share the solution for posterity.

If anyone in the community has ideas, please do share for Neil!

Reply