Solved

VMware HotAdd transport mode is not working very well

  • 8 October 2021
  • 13 replies
  • 2734 views

Userlevel 2
Badge +8

Hi,

 

We are having a problem with our Netbackup setup with HotAdd mode to backup our VMware, i.e. if the Proxy Server takes more than 4 VMs to backup at the same time, loads of VMs would fail with the below error messages.

Error opening the snapshot disks using given transport mode: hotadd Status 23

 

In other words, this Proxy Server can backup no more than 4 VMs at the same time.

 

OK, I admit that this is a problem in our Netbackup environment but I feel that I may get some help by posting this question here in Commvault forum because

 

  • Firstly, the reason why we are testing this HotAdd transport mode within Netbackup environment is because we intend to deploy the same technology in the coming months in our another backup environment which is Commvault.

 

  • Also, at the moment, the Netbackup support seems to be at their wits’ end (which does make me feel a bit disappointed). But, on the other hand, the issue seems to be more connected to the VMware environment and its associated configuration / setup of the Proxy Server rather than the backup software itself. In other words, the problem could be commonly and widely known by all the major backup vendors.

 

  • Last but not least, you guys on this forum do have a much quicker and more informative response, to be honest.

 

So, if your guys are still happy for me to continue, then I’ll give you the details of our environment

  • Number of datacentre: 1
  • Datacenter vSphere version:   6.7
  • The version of ESXi Server hosting the Proxy Server: 6.7
  • Datastore hosting the Proxy Server and other VMs: VMFS 6
  • Total number of VMs to backup: 560

 

The configuration of the VMware backup Proxy Server

  • OS: Windows 2019
  • MEM: 32 GB
  • CPU: 16 cores
  • Local disks: C (100GB with 68GB free) and D (40GB with 35 GB free)
  • The size of the datastore hosting this proxy Server: 4TB with 950GB free
  • The number of SCSI cards: 4
  • The type of SCSI cards: VMware Paravirtual
  • The VDDK version (embedded within the Netbackup): 6.7.3
  • PCI device: 1 passing through PCI which is zoned with a tape library

Just to summarise, this VMware Proxy Server is also a Netbackup Media Server, which reads data directly from its HotAdd mounted VMDK, and then writes data directly to the FC connected tape library.

 

My observation is

  • During the backup, even with simultaneously backup VMs to be set at 16, the CPU and MEM would hardly be stressed.
  • However, from its disk management GUI, with a setting of more than 4 VMs to backup at the same time, it would constantly scanning the SCSI cards to mount/unmount those VM’s VMDK disks, which would freeze the Proxy Server. In fact, the more VMs to backup at the same time, the busier its SCSI cards would be, which makes me wonder if this is the root cause of the problem ?

 

Has anyone seen the similar issue under a Commvault environment with a CommVault Proxy Server setup in this way ?

 

Many thanks,

Kelvin

icon

Best answer by Mike Struening RETIRED 11 October 2021, 23:04

View original

13 replies

Userlevel 6
Badge +14

Hi @Kelvin ,

 

At the time of the issue, how many VM disks do you see attached to the Proxy here?

My suspicion here is that VDS is hanging due to the amount of API GetVolumeInfo calls here.
- You may need to increase the logging for the VDDK/VxMS to get more info on what’s happening around the time of the issue.

 

Have you ensured that Automount is Disabled and Scrubbed on the Proxy?

Is VMTools and any Drivers up to date on the Proxy?

For the VM’s that fail with error “Error opening the snapshot disks”, I’d suggest checking the Datastore for any Snapshot Delta disks also. (Ensure the VM is properly consolidated).

 

 

Best Regards,

Michael

Userlevel 6
Badge +13

Also the Vcenter and\or proxy isn’t being included in the backup? If you snapshot those as part of the job and you could see all kinds of odd problems.

Userlevel 2
Badge +8

@MichaelCapon 

With a concurrent backup of 8 VMs, I could see between 15 and 25 disks mounted on the Proxy Server.

With a concurrent backup of 16 VMs, I could see between 20 and 35 disks mounted on the Proxy Server.

With a concurrent backup of 4 VMs, I could see no more than 15 disks mounted on the Proxy Server.

Yes, the automount is disabled and scrubbed, not at the beginning though, I did this later on but still, it didn’t make any difference.

The VMware tools is relatively new, which is at 11.2.5 (launched last Dec.)

 

Windows 2019 on the Proxy is most up to date

 

For snapshot, we checked, it’s not that the disks snapshots couldn’t be consolidated first the backup, the snapshots were all taken OK, it was the afterwards, the I/O reading from the snapshot disks failed.

 

In other words, it came back to the original point, the Proxy Server couldn’t seem to be able to handle as few as 8 VMs to be backed up simultaneously, i.e. for 500 VMs, only less than 10% could get through, the remaining 450 VMs would all fail with either connection time out error, or opening the snapshot disks error...

 

By the way, because we cap this 8 VMs at the vCentre level, so at no time, the vCentre would allow more than 8 VMs to be backed up at the same time, which would also limit the total snapshots to be no more than 8 at the same time.

 

Cheers,

Kelvin

Userlevel 2
Badge +8

@Aplynx

 

The vSphere Server is included in the backup, but I am not sure if the backup ever got that far before having “Error opening the snapshot disks” all over the screen, because the last time this vSphere Server had a backup was last Sunday...

 

Cheers,

Kelvin

Userlevel 6
Badge +13

For CommVault we would recommend backing up the Vcenter by itself at another time.

https://documentation.commvault.com/commvault/v11_sp20/article?p=125593.htm

Both the Vcenter and the proxies should be excluded.

Userlevel 2
Badge +8

@Aplynx 

Point taken but I don’t think backing up the vCentre Server itself (at any time) is the root cause for us to have failed 90% of the VMs backup.

Cheers,

Kelvin

Userlevel 7
Badge +23

@Kelvin , can you exclude the vCenter from the backup, then see if the issue persists?  As per the docs, we suggest backing it up on its own at another time.

Userlevel 2
Badge +8

​​​@Mike Struening 

 

The testing policy we have been using to test the HotAdd is a non-production VM policy which contains only 171 non-production VMs but not the vCenter Server itself.

And this HotAdd Proxy never passed this test with a setting of more than 6 VMs to backup concurrently, i.e. more than 50% failure rate.

However, with a setting of no more than 4 VMs to backup concurrently, the success rate for this test would be very near to 100%.

 

Regards,

Kelvin  

 

 

Userlevel 7
Badge +23

@Kelvin , appreciate the background.  I would suggest opening a support case and let them deep dive into the issue.

Can you share the case number with me here so I can track it?

Userlevel 2
Badge +8

@Mike Struening, I can’t open a case with Commvault support because it is the Netbackup HotAdd which we are having the problem with :grinning:

That being said, in probably one month time, when the new backup VM is built on a shining new ESXi host in US Commvault environment, we’ll deploy the same HotAdd technology on it.

And I just assumed that, there is a great chance that, the problem we are having with our Netbackup will happen to the Commvault environment as well.

 

Cheers,

Kelvin  

Userlevel 7
Badge +23

Gotcha.  Well, how about we leave this one, and once you have the new setup ready and tested, let us know how it fares?

If it works, great!  If not, we can figure it out.

Userlevel 2
Badge +8

@Mike Struening Fair enough... Meanwhile, since I am still working on the issue (with Netbackup), so I’ll give everyone an update if I resolve it.

 

Cheers,

Kelvin  

Userlevel 7
Badge +23

Appreciate that!

At the end of the day, we want this to work for you :nerd:

Reply