This is a conversation post after my initial post about FR22 .3
This is more of a findings topic/conversation .
I had three older wk8r2 media agents ( now replaced) that experienced widespread issues after going to FR22 .3
NOTE: none of these issues are/were recorded with Commvault as actual issues. The decision to replace/Migrate the OS was made at the 11th hour after working weeks on these issues.
The basic application appears to work just fine with 2k8r2 fr22 .3 - Readiness, services running ,can run jobs etc.
The issue we were running into was consistent across all three. And the only 2k8r2 media agents in our environment's I knew it was an issue. Seemed too coincidental not to be.
After the Fr22 .3 update- within 4 hours our jobs started experiencing all or some of the following errors:
Media mount services
device not ready
Even when attempting to select new snap mount hosts for jobs i was getting connection refused messages in the GXTail event logs.
The most consistent issue i could see across all systems was the flapping of the CVD services. And most common was the CVD.EXE service. Now , i went through full software installs. Updates patches everything. Issue would disappear for about 45-90 min- then start again.
We have multiple tickets open with MA, but the issue could not be pinpointed. We sent countless logs and had various zoom calls with support only to be told they would go to Dev. I am not complaining.. its a hard thing to pinpoint.
On a whim , i decided to do an in place upgrade to 2012r2 on one of the media agents that was low priority but was having the issue. After running all the updates and drivers, the media was up and running. And within 24 hours the Media agent has not had one issue. We didn't reinstall any commvault software-- just updated the OS. And it fixed the issues.
We replaced all 3 systems with 2k12 and 2k16 ( no 2k19 licenses currently) and everything is 100% SLA is returning and all jobs are completing without having to restart services or jobs.
So , If you have a 2k8r2 Media agent and are having random, inconsistent service issues and failures, Here is your problem.