I am working on some performance issues with my backups.
Looking at the media agents during the backups, looking a the media/proxy agents I am seeing what I think is high latency on the loopback adaptor. in the screen shot below you can see in in Windows resource monitor the latency for vsbkp, cvd and other processes are around 30ms. The loopback adaptor is internal to the server, never touching a ethernet switch or wire, so I would think it should be much lower right?
I am wondering what type of latency other folks are seeing on their media agent and or proxy agents during backups?
Thank you ahead of time!
Best answer by Farmer92
Yup. that is the same as how we have it setup with the one MA owning the FC MAG and sharing it to the peers in the GridStor.
I also wonder if that is part of my loop back issue. If the MA running the backup also owns the FC storage, it will use more loopback communication between it’s services, vs if it has to talk to another MA then it would not use the loopback. Support did give us an additional setting key to try. I can not find much of a description for what is really does. When we get things caught up we are going to try it out and see how it goes. It needs a service restart to take effect.
Hey @Farmer92 ! Are you seeing performance issues on all backups for this media agent or only vsa backups? I’d like to get some eyes on this though we need some context.
I don’t think I’ve ever looked at latency in loopback when troubleshooting performance, but you are right - Commvault uses the loopback adapter as its primary way to communicate between processes local to the machine. For example, CVD on a Media Agent can receive a datastream from a client and pass it off to cvmountd for writing - likewise signatures received and passed off to SIDB2 etc.
I know that windows 2012 had major enhancements for loopback performance; I can’t tell which OS that screenshot is from, but if its 2008 then you may see a benefit from moving to 2012+
I believe you can also use iperf3 to measure loopback performance outside of commvault to see what type of performance you are getting.
What sort of performance issues are you seeing on your backups? any particular agents or across the board?
Thank you for the feed back, I did forget to mention that the Media agent is Windows 2016, and we are running Commvault version 11.20.60.
Our original performance issue started when we changed to CrowdStrike for our End point protection. The backup speeds were more than cut in half for all backups. (longer than double run times). Working with CrowdStrike support, who said we would not need exclusions, ended up adding the CV recommended exclusions. This only helped minimally. After a week of not hitting our backup deadlines, we uninstalled CrowdStrike. The following round of backups were still very poor. It seams that CrowdStrike did not fully uninstall it self. It really felt like it still left some things behind.
We ended up rebuilding the media agents on the same hardware, and we got the performance just about all the way back. Through all of this I had noticed the loopback latency was over 100ms when Crowdstrike was installed, now that we went back to the original Endpoint protection, we are around 40ms plus or minus.
I would love to hear what other folks see when there backups are running hot and heavy. Are you around 40ms as well. Higher, Lower?
I see 100GB reference, so can you detail a bit the hardware part of this ?
I tried to look at one of my last few windows MA (we reinstalled them to Linux mostly), to compare, but mine are only 10G, and those one are underloaded, so it might not be relevant at all to compare to yours, though I can see some latency of 180ms for cvfwd mostly, on one of them currently performing a VSA backup of 2 VMs subclients.
The servers are Physical. They have 2 x 10Gb ethernet cards with Dynamic LACP. So each should look like a 20Gb link speed.
Thank you for checking out your MA’s. It is interesting that you see 180ms…
Out of curiosity, why did you change your MA’s to Linux? I know I have thought about it before, wondering if I may get better performance, but my lack of Linux skill has scared me off a bit. Are you seeing better performance using Linux vs Windows MA’s?
So strange that the scale is reported in 100GB then
Switching to linux was more for security reasons than for performance.
Honestly, looking at the pros and cons, I am as you more skilled and mastering the windows servers than the linux ones, and would have kept my windows OSes if it had not been a top management decision to apply. Security is a bit better, but the way to access all logs and performance information on linux compared to the windows MA role, then I loose time and efficiency on linux.
Regarding performance, I can’t tell you if there’s an improvement, as we also changed our backup landing zone from some Netapp 4x10G NAS in NFS/CIFS to some PureStorage 4x40G Flashblade in S3 mode. I have almost no ‘error processing chunk’ issues with that setup, while I had too many before..
With most of our VMs beeing windows, I also had to setup the dedicated MA for file-level restore from the VSA backups. From what I read, since v11.23 this is not needed anymore, but I will upgrade to v11.24 in september, so I’ll provide my feedback at that time
That is good to hear you don’t get the error processing chunk messages with the Pure FlashBlades. We are using FC connected SAN storage for our MAG space, it has been pretty solid.
I forgot to mention that my linux MAs targetting the Flashblades are in Gridstor of 4 members, so this delivers the best performance, and mount path access shared to all the Gridstor, compared to the SAN storage which is mostly held by an MA and shared to the others. So if one MA needs to access a block stored on a volume attached to another MA, performance is a bit degraded as they need to ‘discuss’ together to access the data.
Yup. that is the same as how we have it setup with the one MA owning the FC MAG and sharing it to the peers in the GridStor.
I also wonder if that is part of my loop back issue. If the MA running the backup also owns the FC storage, it will use more loopback communication between it’s services, vs if it has to talk to another MA then it would not use the loopback. Support did give us an additional setting key to try. I can not find much of a description for what is really does. When we get things caught up we are going to try it out and see how it goes. It needs a service restart to take effect.
So far we have not put the additional setting on our primary media agents. We are still working thru some other issues. When we get to trying it, I will update y’all.
@Mike Struening , Thank you for checking in, unfortunately we have not had a chance to try it yet. We are still working through some other critical tickets. In addition, it is difficult to get the window of time to test settings like these with a large environment using all available time to run the backups. I am hoping in the next few weeks to try this setting.
@Mike Struening I am so so sorry.. we are still working on some other critical tickets and did not want to add other changes yet… I hate waiting this long… again so sorry
Hi @StanM ! This thread hasn’t been too active lately. However, if you are looking to use the setting and share your experience, we might get an answer
We use 3 different kinds of cookies. You can choose which cookies you want to accept. We need basic cookies to make this site work, therefore these are the minimum you can select. Learn more about our cookies.