Skip to main content

I do full backups of production servers on Friday night and non-production servers Saturday night.  Every Monday my inbox has roughly 20 anomaly emails about aux copy jobs running longer than usual.  This has been going on for months now.  Is it possible to tweak the anomaly thresholds in CommVault so I get fewer emails?

Ken 

Hi Ken_H,

I assume you have one or more of the following alerts enabled. You can edit them to tweak the criteria. 

Aux copy job delayed

Job Management / Auxiliary Copy

This alert notifies the user when an auxiliary copy operation is in a waiting state for four hours.

Aux copy job failed

Job Management / Auxiliary Copy

This alert notifies the user when an auxiliary copy operation meets one of the following criteria:

  • the operation fails to complete, fails to start or is aborted by the system
  • the operation runs late, or is skipped by the system or due to a holiday

Auxiliary copy fallen behind

Job Management / Auxiliary Copy

Any of the following criteria have been met for the selected storage policy copy:

  • To be copied data is over the quantity specified in TB. Default value is 25TB.
  • Jobs that are over number of specified days old and have not yet been fully copied.
  • More than the default value of 48 hrs is required to copy all data, based on the average throughput of previously run Auxiliary Copy jobs. It is recommended that you adjust this value with an estimated completion time so that the alert above will not be triggered.

Note: If you do not select the To be copied data is over option for the storage policy copy, you will receive an alert when the data to be copied exceeds the default threshold of 25TB. Also, a warning icon appears for the storage policy and the storage policy copy.

If an alert is received, give immediate attention to any abnormal condition. Immediate action will prevent the Auxiliary Copy falling behind the source copy, in terms of amount of data to be copied.

The alert indicates the criteria that triggered the alert, with one alert reason for each criterion above, respectively:

  • to be copied data exceeded
  • old jobs not copied
  • need more time to copy data

Assuming the Auxiliary copy has had no issues and is expected to run for that duration, use the throughput and application values from the Auxiliary Copy Job Summary Report to set the value for this third criterion. For example, application data size is 1000 GB and throughput is 500 GB/hr, set time to 2 hours.

Note:

The example below shows how the throughput is calculated internally and can be followed to set the hours value for the third criterion:

Auxiliary Copy Job1 copied 4 TB in 2 hrs.

Auxiliary Copy Job2 copied 2 TB in 1 hour.

Average Throughput = (Total data copied by all jobs) / (Total time taken by all previous Auxiliary Copy jobs)

In the example,

  • Total data copied = 6 TB
  • Total time taken = 3 hrs

Average throughput is 6/3 = 2 TB/hr.

To copy 4 TB, the calculation is 4/2, rounding up to the next hour set the alert to 2 hrs.

If the amount of data to copy increases to 27 TB, set the alert to 14 hrs, if throughput is 2TB/hr.

If throughput increases, for example 10 TB can be copied in 1 hr, then set the alert to 3 hrs, if data to be copied is 27TB.

Fallen behind alert for Silo copy and Snap copy is not supported.

The interval between alerts for this criterion can be configured in the Media Management Configuration (Auxiliary Copy Configuration) dialog box. The default interval is 24 hours, which is set in the Interval (Hours) between Auxiliary Copy Fallen Behind alerts option.


Thanks @Blaine Williams . 

I’ve got 40 alerts activated and (unfortunately) deleted all the alert emails from last weekend.  I think they may be coming from the Commserv Anomaly Alert.  I’ll dig into this more … probably next Monday. 

Ken


Hi @Ken_H 

Kindly raise a support ticket, please ask them to send it to Mrityunjay Upadhyay.

Please attach the alerts you receive on Monday. 

We will check and get back.

regards

Mrityunjay


@Ken_H  I have spoken to Mrityunjay offline. Lets see which alert it is doing it and the configuration first and then fall back on his offer if necessary. 


@Blaine Williams This is the type of email I’m getting:

 

 

 
 

The system detected jobs that are running longer than usual time in inf-srv57.

 
 

Auxiliary Copy Jobs :

Storage Policy

Storage Policy Copy

Job ID

Percentage complete

Anomaly threshold

Running time

Delay reason

Tier2_Prod@Main

2-DASH_to_DRMSA

467099

70

1 hours, 20 minutes

1 hours, 46 minutes

 
 
 

Please click here for more details.


@Ken_H , can you set the threshold to 2 hours (or whatever you find acceptable) as per the settings in @Blaine Williams  previous reply?

That should cut down on these, based at least on this example.


Reply