Hello All,
For the past 2 weeks, we have been experiencing the following issues.
All IntelliSnap operations are extremely slow.
This affects VMware, Volume Snapshot and Exchange.
We use IntelliSnap for this and have NetApp storage behind it.
Globally, the behavior is the same everywhere. Locations with OnTap 9.15 and also OnTap 9.11. Not only individual locations are affected, but all of them.
Even sites with their own physical media agents.
In other words, the error occurs everywhere.
We have tried everything. The Commserv is not overloaded. It has 4 socks and 16 cores.
128GB RAM.
The SQL database is also not busy or extremely overloaded.
What steps did I take:
- Moved Commserv to another VMware cluster.
- DB Maintenance
- Installed SQL and Windows updates.
Started a job, same phenomenon.
Our network team has carried out all kinds of performance tests. No abnormalities.
Commserv and media agents are in the same VLAN.
The support team has opened 2 cases and filed them under VMware and File Snapshot. However, I see a connection here.
The following abnormalities are in the log:
17780 610 02/08 16:14:35 16479262 CVSnapClientAPIInternal::initialize() - Request for CVSnapClientAPIInternal::prepareVolumeSnaps - JId [16479262] CCId [0].
17780 610 02/08 16:14:35 16479262 CVSnapClientAPIInternal::getRemoteOpNextStep() - Going to execute it locally
17780 610 02/08 16:14:35 16479262 CVSnapClientAPIInternal::discoverVolumeDetails APP TYPE ID:[13]
17780 610 02/08 16:14:35 16479262 CVSnapOSUtil::getInstance() - Fetching snapOSUtil for Engine [3], App Type [13]
17780 610 02/08 16:14:35 16479262 CVMMSnapAPI::getControlHostInfo() - Got client name [MediaAgent001]
17780 610 02/08 16:14:35 16479262 CVSnapClientAPIInternal::discoverVolumeDetails() - From Snap OS Util VolSnap:Status-[0] Err-[0:].
17780 610 02/08 16:14:35 16479262 CVSnapClientAPIInternal::getRemoteOpNextStep() - Going to Next Step for Zoning
17780 610 02/08 16:14:35 16479262 CVSnapClientAPIInternal::getRemoteOpNextStep() - Going to execute it locally
17780 610 02/08 16:14:35 16479262 CVMMSnapAPI::processVolSnapOperation() - Request to MM -- << OpType[5: Prepare] SnapJob[16479262] SrcClient[10] MountHost[0] VmHostId[0] OpSrc[1: IDA] OpMode[3: HWDB] Flags[0] HopCount[0] Status[0] >>
17780 610 02/08 16:14:35 16479262 CVMMSnapAPI::processVolSnapOperation() - VolSnap to MM -- << [V-1: -1] [MS: 11] [S-1: -1] [MD-0:] [M-1: -1] [C-0:] [E-0:] >>
12480 41e8 02/08 16:19:33 16479261 CCvNetwork::CheckIfDataAvailable() - Thread is waiting for data on a socket. Waited for 600 sec(s). Requested wait time = 43200 sec(s). Remaining wait time = 42600 sec(s). Connection details: commcellname/commcellname.domain.int/SockIP(127.0.0.1)/commcellIP:MediaManager/0/0(MediaManager) MediaManager.exe(17592:1c4c)
17780 610 02/08 16:24:35 16479262 CCvNetwork::CheckIfDataAvailable() - Thread is waiting for data on a socket. Waited for 600 sec(s). Requested wait time = 43200 sec(s). Remaining wait time = 42600 sec(s). Connection details: commcellname/commcellname.domain.int/SockIP(127.0.0.1)/commcellIP:MediaManager/0/0(MediaManager) MediaManager.exe(17592:1c4c)
He seems to be waiting for something. However, the connections between media agents and Commserv are without problems.
I don't have any more ideas either.
Snapshots that took 1 minute sometimes take 1 - 3 hours.
VMware jobs eventually fail.
Backup with the file system agent, NDMP backup copy or SQL agent backups run without problems.
Do you have any advice on how I can narrow down the error? We are getting nowhere with our partner and the support case, and we have had massive problems and restrictions for 2 weeks.
I am grateful for any help.