Skip to main content

Currently we have a support case with DELL/EMC due to issues with the NDMP backups. We had several Commvault support cases which all pointed out that the issue was out-side of Commvault. 

EMC keeps pushing to Commvault, and they now come with the following remarks:

When we run a backup the following optiosn are seen in the logs:

Sat Oct  1 22:33:49 2022 (1664656429): Received from  131.224.111.168; Session:64810
Message   : 0x401 (NDMP_DATA_START_BACKUP)
Timestamp : 1664656429
XSequence : 10
RSequence : 0
Error     : 0 (NDMP_NO_ERR)
    Bkup type : dump
    Num Env. Var : 13
        Name (value) : BACKUP_MODE (SNAPSHOT)  
        Name (value) : BACKUP_OPTIONS (2)             
        Name (value) : BASE_DATE (1664569940)   
        Name (value) : DIRECT (Y)
        Name (value) : DMA (COMMVAULT)
        Name (value) : DMA_VERSION (11.0.0(BUILD80))
        Name (value) : ENCODING (UTF8)
        Name (value) : FILESYSTEM (/ifs/redacted])
        Name (value) : HIST (Y)
        Name (value) : MULTI_STREAM (CV5267018:4:1:1:32808_4571_5424)
        Name (value) : MULTI_STREAM_HINT (2)
        Name (value) : RECURSIVE (Y)
        Name (value) : UPDATE (Y)
 

And tehy send us a logcut of a different customer:

Message   : 0x401 (NDMP_DATA_START_BACKUP)
Timestamp : 1664744375
XSequence : 14
RSequence : 0
Error     : 0 (NDMP_NO_ERR)
  Bkup type : dump
  Num Env. Var : 13
         Name (value) : BACKUP_MODE (SNAPSHOT)
         Name (value) : BACKUP_OPTIONS (2)
         Name (value) : BASE_DATE (1664629729)
         Name (value) : DIRECT (Y)
         Name (value) : DMA (COMMVAULT)
         Name (value) : DMA_VERSION (11.0.0(BUILD80)) <- same version and build of Commvault
         Name (value) : ENCODING (UTF8)
         Name (value) : FILESYSTEM (/ifs/null/null)
         Name (value) : HIST (f)
         Name (value) : MULTI_STREAM (CV128609_327_439_1)  ß this is what we expect to see
         Name (value) : MULTI_STREAM_HINT (4)
         Name (value) : RECURSIVE (Y)
         Name (value) : UPDATE (Y)

 

They point out that the sting is formated differently, with the same version. The version number is very generic so this does not tell us to much. 

 

My question is does anyone know when (which SP) the format of this string has changed. I do not see why this would cause the observed issues, but I would like to prevent to go to CV support again, with a wild gues of EMC. 

 

Hope someone can help. 

Hi @Marcel Krommenhoek , thanks for the post!

To confirm, you’re referring to the underscores vs. the colons?

Any chance any of your cases with CV support are still open?  If not, open a new case (share the number here so I can track it) so our dev team can look into it closer.

I’m not aware of this, though a deeper dive is best here.


Hi @Mike Struening 

 

Thank you for the reply. We had 3 cases with CV on this issue, and everytime it was not a CV issue. But we will see if we spend another case on it.

Hope we can help the customer out in stabalizing the NDMP backups. 

Br.

 

Marcel


What’s the actual issue with the NDMP backups though? These fields I believe will change with each job as I believe they refer to some of the job parameters. 


Hi @Jordan,

The actual issue is that DELL\EMC is constantly pointing to Commvault. We are working on this case now for over a year, and several CV cases have been reaised for al kinds of "Issues” they spotted. 

Now they where pointing to this string, which in my opponion was fine, as CV generates it per stream and a good percentage of the streams succeed. But they pulled the other info from a previous case and are not willing to continue until we have confirmed this was no issue. 

The actual issue is that 30-50% of the NDMP job streams stall, whit multile jobs running for over 24 hour with no progress. We can see in the Commvault logs (and netstat) the connection is up and that keep alives are send, but the isilon is not sending any data. 

The latest development is that we are back at the point that they are pushing for us the ask commvault to give the BACKUP_OPTIONS = 7 parameter. Which we already had a ticket about in april and CV support confirmed that they do not have the ability to set the option. 

Now we have been diving in the EMC documentation and ave found that they state it is only possible to set in the Isilon console. 

I could write a book on this case, but not to bore you to much, the short story is DELL\EMC is playing the blame game for over a year now. 


I had the same issue (and probably reached the same person at Dell support). In our scenario there was an anomalous high CPU condition on the Isilon that went away after a week and nothing had to be changed with our backup config. In the meantime the case got escalated and both Commvault and Isilon engineering got engaged and luckily have a good working relationship. I am pasting in the conclusions we got in case this helps anyone else.

 

General summary (Isilon NDMP backups using Commvault)
 
There are 2 backup methods for NDMP Incremental backups - Timestamp and Snapshot. Both methods are fully supported by both vendors and both methods work equally well for general workloads. The default setting in Commvault is Snapshot (as set with the "Fast Incremental (Use Snapshot for Backup)" checkbox. Changing the setting does not take effect until a Full backup is run. Alternating between settings should not have downstream impacts for Commvault - deduplication, retention, etc should be the same. 
 
There are two main considerations between the two methods. In environments with high daily change rates (over 2-3%) the Timestamp method is more efficient. In these cases Snapshots use more resources since Snapshot backups require taking and comparing differences between 2 snapshots in memory and this can take longer when there is more data to process. Alternately, with smaller change rates Snapshot backups can be faster since Isilon can efficiently identify the small number of file changes between two snapshots whereas in a Timestamp backup the Isilon needs to evaluate all files to determine what changes occurred since the base time. Dell is seeing more customer with higher change rates so Timestamp should be strongly considered as an alternative to the default setting.
 
Although the Commvault default is to use Snapshots, if the Snapshot backup fails for any reason the Isilon will automatically switch over and provide a “token based incremental” Timestamp backup instead. 
 
When using Snapshot Incremental backups there are additional granular parameters that can also be changed but are not usually exposed settings. Although obscure, these settings have led to misunderstandings by the respective vendor support teams. Commvault software is coded to pass the parameter "2" but there have been circumstances where customers have been asked to use option "7" instead. After reviewing with both vendors we came to the following joint understandings
- Dell agrees that both option 2 and option 7 are valid and fully supported
- Commvault can provide customers with an Additional Setting that causes Commvault to use Option 7 instead of the default
- Option 2 was selected as the default because it leaves only the prior snapshot from the last successful backup thus conserving Isilon resources while option 7 may keep an unlimited number of snapshots.
- Option 2 depends on agreement between Commvault and Isilon on Snpashot timestamps. In cases where there is a mismatch the fallback process would run a timestamp based incremental which can take more time than a snapshot based incremental if the data change rate is low or less time if the change rate is higher. Generally, option 2 is recommended unless there is a backup scenario that requires the additional snapshots.
 

 


Reply