Question

NDMP backup of NetApp filer with millions of files

  • 10 January 2023
  • 9 replies
  • 636 views

Userlevel 2
Badge +6

Hi,

One of our customer will get soon a NetApp Filer with Full Flash disks (SSD disks). They will have one volume in their NetApp system with more than 100 millions of files. 

I proposed to use NDMP backup type and configure the feature NetApp C-Mode Multi-Streaming Backups.

But the customer is afraid that he will have performance issue using NDMP backups due to the high number of files in one volume.

Any advice?

Many thanks.


9 replies

Userlevel 7
Badge +19

@brucquat well it is a filer, right so it is designed to house a lot of files. However I'm aware of some limits on ONTAP when it comes to regular flex volumes, which NetApp tried to fix using flexgroups which is basically a single representation of multiple volumes. If you use flexgroups than I do not see an issue that you might hit when using NDMP backups. However I'm not sure if NDMP is the right choice to pick these days, You might very well create a NAS client using SMB/NFS and leverage multiple streams and access nodes to add parallelism and this improve backup and restore performance. 

Userlevel 2
Badge +6

Thanks for your feedback but what about the following feature NetApp C-Mode Multi-Streaming Backups (commvault.com):

For NDMP NetApp C-mode subclients, you can configure multiple data streams to back up an individual content path on a subclient. After you configure multi-streaming for a subclient, the next full backup operation multi-streams the content path. If a content path is not completed during the full backup operation, the next incremental backup operation multi-streams the content path.

 

Thanks

Userlevel 7
Badge +19

@brucquat have you read the support section? You stated the systems has more than 100 million files. In order to be able to use this you will have to have quite some knowledge and understanding about the file layout and structure. 

 

 

Support

Multi-streaming within an individual content path is supported for the following:

  • Full backup operations when the subclient content paths are full-volume content paths.

  • Volumes must meet the following requirements:

    • The volume must be larger than 300 gigabytes.

    • The average subdirectory size must be larger than 20 gigabytes.

    • The number of subdirectories must be fewer than 5000.

    • The backup content root path must contain fewer than 20,000 files.

    • The volume cannot be a FlexGroup volume.

    • The volume cannot be a read-only volume or a snap-mirror volume.

Userlevel 4
Badge +11

mount the volume up on a linux Media Agent and use the file system agent to protect with a large number of streams.

you can also have CV mount it up automatically when the backup runs using this syntax in the content tab:

 

file_server_interface:/NFS_export_path

 

more info in the link below 

https://documentation.commvault.com/2022e/expert/106907_creating_subclient_with_nfs_exports_as_content.html

Userlevel 2
Badge +6

Hi, thanks for the suggestion BUT we only have Media Agent under Windows and there will be no Linux MA.

So the question remains …

Thanks

Userlevel 7
Badge +19

You can also do it with a Windows MediaAgent. Just follow this section → https://documentation.commvault.com/v11/essential/132972_add_nas_file_server.html

Userlevel 2
Badge +6

Hi, except NDMP backup or using  NAS client using SMB, is there another possible approach?

Many thanks.

Userlevel 7
Badge +19

If you have a second NetApp you could consider creating snapshot and let Commvault mirror them to the other array from where you take the backup. This offload the load from the "production” filer. I however would first check if this is really needed by analyzing the current load on the system and by performing some tests. 

Userlevel 2
Badge +6

Hi,

I’ve also seen the Block-Level backup option “Optimize for file system with large number of files” when using File System Agent. 

Is it different than what was mentioned above “NAS client using SMB/NFS and leverage multiple streams and access nodes to add parallelism” ?

Thanks for your help.

Reply