Solved

@Enterprise Admins: How do you monitor (pro active)?

  • 13 July 2023
  • 3 replies
  • 259 views

Badge +2

Hello Guys :) 

 

@large environments: How do you monitor (proactive) CommVault - CommServe, MediaAgents, xAgents, Jobs, Logs, Dedup Engine/Performance, Alerts, Reports, ... and so on?

 

I've used Nagios with checkmk. But what is the current  "state of the art"? Buzzword AI. For example, I like how Azure is handling all the metrics, and can solve some problems by itself or can give you useful informations, can trigger events/processes, can "learn” - and for example, can create trouble tickets or something similar.

 

Which tool makes your life easier? Grafana?

 

Excited about your feedback(s) :).

 

Cheers!

icon

Best answer by Sean Crifasi 14 July 2023, 03:55

View original

3 replies

Badge +1

I want to monitor the Nodes with checkmk/grafana. But for that an agent installation on the nodes is required, which is not recommended by CV. How can i do that? What i can do is collecting the sar files, but that requires a lot of work to display the data usefull. Any usefull solutions?

Userlevel 7
Badge +23

Hey @Fusi,

Check out this thread, it covers some of what you mentioned here:

 

In terms of reports - my suggestion is ensuring SLA is kept, jobs are completing within required timeframes (if not, then its time to look at performance like DDB etc.) and managing by exception (anomaly reports).

 

Userlevel 4
Badge +9

Hi @Fusi 

In addition to Damian’s suggestion I’d like to mention some worthwhile alerts. In my experience these tend to get overlooked but can add peace of mind and early detection for potential problems. I recommend browsing the store for various Alerts/Reports as there’s new ones added from time to time that can be helpful in day to day operations. 

- Monitor for long running restores or admin jobs to detect a potential issue
https://store.commvault.com/webconsole/softwarestore/store.do#!/137/683/5962
- Monitoring for an unusual threshold of jobs lingering in a queued state without progressing 
https://store.commvault.com/webconsole/softwarestore/store.do#!/137/683/5964
- Prevent potential space consumption issues by ensuring data aging is running regularly
https://store.commvault.com/webconsole/softwarestore/store.do#!/137/683/11979
- Alerting if the Commcell level activity control has been disabled to avoid unintentionally leaving backups not running for an extended period of time
https://store.commvault.com/webconsole/softwarestore/store.do#!/137/682/10701

Reply