Solved

Monitoring Commvault processes by Nagios


Userlevel 2
Badge +8

Hi all,

does anybody use Nagios to monitor Commvault related processes on infrastructure or clients?

It would be very useful to have experience report or best practices.

 

Thank you

Gaetano

icon

Best answer by Damian Andre 18 January 2023, 02:53

View original

13 replies

Userlevel 6
Badge +15

Good morning.  I am not aware of anyone who has used Nagios to monitor the software.  Are you using this to get telemetry and other details?

Userlevel 2
Badge +8

Hi @Orazan ,

we use it to monitor services and hosts health by grabbing data from them. Often we also check the status of given services by verifying that specific processes are running.

Most of the checks are performed parsing the output of SNMP queries, some other time by running specific scripts.

SNMP output or scripts output are compared with tresholds and, if needed, alerts are triggered. 

Badge

Hi all,

does anybody use Nagios to monitor Commvault related processes on infrastructure or clients?

It would be very useful to have experience report or best practices.

 

Thank you

Gaetano


We are also looking for a Solution. 
Using nagios, too 

Userlevel 7
Badge +23

I think monitoring the service status and disk space (Index, DDB, Mount paths, CommServe DB etc) would be a good starting point. Anything beyond that and its more likely to picked up by built-in Commvault alerts.

Have you folks checked out netdata? https://www.netdata.cloud/

Its pretty amazing and monitors a bunch of stuff out of the box that you’d spend days or weeks configuring on nagios. Its a one-command install too.

Userlevel 2
Badge +8

Hi,

besides physical resources (CPU load, disk space, memory, etc) for the time being we check for the existence of processes including the executable path (e.g /opt/commvault for Linux) in the SNMP query response. 

Nagios configuration for the service is like this 

define service{
use Availability-Server-Service
host_name media_agent
service_description CommVault Services
check_command check_snmp_process!"/opt/commvault/" -f!0!0
}

I was curious to know if there is any better or more granular way to perform such check.

 

@Damian Andre , thank you for the suggestion.

 

Have a nice day

Gaetano

Userlevel 2
Badge +8

Hi,

let me update this topic with some more information, maybe it can be of help.

I wanted to go into more details trying to monitor the most meaningful processes on our main Linux based media agent. For this to be achieved, first of all I needed to understand which processes are expected to be always running. My current attempt works on this list:

  • Base/CVODS
  • Base/cvlaunchd
  • Base/cvd
  • Base/cvfwd
  • Base/ClMgrS
  • MediaAgent/CvMountd
  • Base/3dnfsd.exe

I am monitoring them with Nagios using built in commands based on SNMP queries.

 

More to come...

Userlevel 7
Badge +19

Personally I would take it a different approach, because yes you can monitor on each and every process but it doesn't say much if they are really functioning as expected. So I would only focus on the environmental things like system resources and if the Commvault services are up-and-running. I would combine this with monitoring the actual system state check which can be done by executing a check-readiness or by retrieving the system status as how it is being represented within Command Center. Unfortunately Command Center is still lacking a built-in alert which can be used to alert on disconnected/offline clients, I hope this is added soon! Anyhow monitoring from this end gives back the status of the entire chain from CommServe to client and from client to MediaAgent. 

Userlevel 2
Badge +8

Hi @Onno van den Berg ,

I do agree with you, what I am doing now is the very first step of the chain: check on the status of the stable services/processes running the infrastructure. 
Your suggestion to proceed with the Check Readiness is very good, do you know if it is possible to run it from the command line in order to integrate it within Nagios? This would make the information available in a single checkpoint
 

Thank you for your comment

Gaetano

Userlevel 7
Badge +19

Well in case you check the services then I think this is more than enough, because from the other side you monitor the complete chain, so no need to monitor individual processes. In addition there have been some name changes in the past and some were merged. 

Yes, this is possible and it should even be possible to run it on the client it self via a qcommand. However it comes with some challenges when it comes to parsing the output you should be able to manage it.

I personally bypass Nagios and just send the output directly to the paging system. From the CommCell console there are default alerts available which allow this to be implemented. I myself am waiting for a new alert which can be configured in Command Center so I can use a webhook. 

Userlevel 2
Badge +8

Thank you for your suggestion. 
For my understanding, do you mean that it is possible to run a Check Readiness from the Nagios server towards, let's say, a Media Agent? Probably by an ssh session and then issuing a qcommand?

This should allow for granular readiness check…

Userlevel 2
Badge +8

Hi,

yet another update. This is the current result about monitoring a Windows based CommServ. This required a bit of manual work but it is detailed.

It was obtained by adding one service definition in Nagios configuration for each of the services to be checked.

Each Nagios service block looks like this:

define service{
use Availability-Server-Service
host_name MY-SERVER
service_description Commvault Services - Commvault Application Manager
check_command check_windows_services!"Commvault Application Manager" -N 1
}

and leads to the following result


Have a nice day

Gaetano

Userlevel 7
Badge +19

@Gaetano I would recommend to pick a difference approach. Monitoring al these specific services also adds load to the systems you are monitoring and knowing if a service runs is not important because it is as expected, you just need to know which service is not running while it is expected to run. So I would just pick check_services to monitor on this. 

Userlevel 2
Badge +8

@Gaetano I would recommend to pick a difference approach. Monitoring al these specific services also adds load to the systems you are monitoring and knowing if a service runs is not important because it is as expected, you just need to know which service is not running while it is expected to run. So I would just pick check_services to monitor on this. 

Hi, can’t find check_services in the tools available in my installation on Nagios, am I looking in the wrong place?

 

Reply