Question

Best practices for a large linux file server besides File Agent?


Userlevel 2
Badge +9

Our linux file server is a large VMware VM - 48GB RAM, 8 virtual CPUs that are running on a Intel(R) Xeon(R) Gold 6248R CPU @ 3.00GHz - and it has about 500TB of storage on it, that our linux admin has set up as 12x 60TB (max size) volumes on our SAN, that communicate back to VMware over iSCSI.

We get reasonable performance with our linux users using this linux VM for day to day usage, but certain customers and projects are reaching crazy levels of unstructured file storage. We have one customer that has a folder that consumes 16TB of data, across 41 million files. And while that’s our worst offender, the top 5 projects are all pretty similar.

We’ve been using the Linux File Agent installed into this VM since starting to use Commvault in 2018. We typically see about 3 hours for an incremental backup to run across this file server, with the majority of time just scanning across the entire volume, and the backup phase running relatively quickly. We run 3x incremental backups per day, at 6am, noon, and 6pm.

However, that aforementioned project is starting to change/touch 2 million files per day. The file agent is starting to take 10+ hours to do a backup, and we’re missing a lot of our incrementals and falling behind.

We are wondering what other organizations do with Commvault when their production file servers start getting this large. In 2018 we were using drive striping, so we couldn’t use Intellisnap, but we’ve since stopped doing that. Is that what would be recommended instead, to move to block level backups instead of file? We did use block level backups on our Windows file server (which was only about 90TB of data, not the 500TB of the linux file server) but we found that when we needed to do a Browse action in commvault, it would take 30-60 minutes just to start the browse. I am wondering if a browse across a 500TB block backup would be even worse.

Any suggestions or options for using other components or functions of Commvault to deal with this huge amount of data on a single file server are appreciated.

I figure that in the grand scheme of things 500TB for a file server can’t be that large, right?


12 replies

Userlevel 4
Badge +10

Hi @ZachHeise 

 

You can try block level backups which will backup only the changed extents instead of the entire file. It’s a driver based backup, where driver monitors the changes and backs up only specific extents which are modified.

 

You can find more information here -

https://documentation.commvault.com/2022e/expert/24567_block_level_backup.html

 


 

Regarding your concern regarding browse taking long time due to mounting of live snapshot, we support inline cataloging with block level, where we will catalog the entire data with block level job itself.

 


 

In future release, we would support offline cataloging with block level, which will catalog the data outside of block level job using same / different access node. Browse will be much faster as items would be fetched from index instead of mounting a live snapshot.

 


 

Disclaimer : Feature support (offline cataloging) cannot be claimed officially as of now as it is targeted for future releases.

 

Thanks,

Sparsh

Userlevel 2
Badge +9

Hi Sparsh - block level backups don’t have the ability to exclude/filter folders though, from what I’m seeing at https://documentation.commvault.com/2022e/expert/24569_block_level_backup_support.html correct? We really need that functionality because we work with PHI restricted data and it would violate our data providers’ TOS to have any backups of the data. Right now with our file agent, we use the exclusion/filter properties extensively.

Thank you for your reply though. Other thoughts about ways we could be improving our linux backups, if block level backups don’t allow filters?

Userlevel 6
Badge +17

You mention it’s the SCAN time which is the problem.

https://documentation.commvault.com/2022e/expert/115452_file_scan_methods_for_unix_file_system_agent.html

Review the requirements for Optimized Scan to see if that would work for you.  This creates a mini database to track the file system changes.  So you don’t have to run a painful recursive scan across the whole file system every job.

Thanks,
Scott

Userlevel 2
Badge +9

Thanks Scott - we’re already using that Optimized Scan option on the linux file server in question, not recursive. I definitely double checked just now though to make sure!

Userlevel 2
Badge +9

Define multiple subclients and name each subclient to describe the content you are backing up. Give each subclient different working hours and run it 3 times according to your request.
Since it is an optimized scan, it will take less time for each subclient to scan.

Userlevel 3
Badge +9

Hello @ZachHeise 

What is the backend storage of 500TB? NetApp, JBOD or Isilon?

For NetApp you can probably use IntelliSnap 

Userlevel 7
Badge +19

As @0ber0n already suggested I would also recommend considering chopping up your config into multiple subclients and/or considering to configure it as a NAS and use multiple proxies/access nodes.

Add a NAS File Server (commvault.com)

As for the suggestion to use IntelliSnap, you will hit the same challenge for the backup copy as well. 

Userlevel 2
Badge +9

Hello @ZachHeise 

What is the backend storage of 500TB? NetApp, JBOD or Isilon?

For NetApp you can probably use IntelliSnap 

The backend SAN is Dell ME4084 powervault, which has iSCSI connections to VMWare VCenter, and then the linux file server in question just has lots of VMDK “hard drives” attached to it that then get shared from that file server via samba and NFS to the servers that utilize the storage.

Userlevel 2
Badge +9

As @0ber0n already suggested I would also recommend considering chopping up your config into multiple subclients and/or considering to configure it as a NAS and use multiple proxies/access nodes.

Add a NAS File Server (commvault.com)

AS for the suggestion to use IntelliSnap, you will hit the same challenge for the backup copy as well. 

Hi Onno - so if I’m understanding your post, and the link you included correctly, you’re saying that I might get superior performance if, instead of my current behavior of using the on-server Linux File System Agent to perform the backup locally, instead I use my mediaagent to backup the linux file server over the network? All my mediaagents are windows so then - this would be using samba/CIFS?

This is an intriguiging idea that would likely move a lot of the CPU workload from the linux file server, to my mediaagent, but is there any sort of documentation or estimation on what sort of backup-time performance this might result in? It seems to me like this would be less efficient than the linux file server agent locally on the file server, since 100% of the NAS’s data would need to be examined, over the network CIFs connection, before determining how it could be compressed, deduplicated, etc? Vs our current paradigm of having the file server and the commvault agent locally installed on it make those determinations before it ever starts sending data out over the network?

Thank you though, it’s worth thinking about!

Userlevel 1
Badge +5

Question:
1. How many VMDK’s are presented to accomodate 500 TB
2. Are there any RDM disks provisioned alongside VMDK’s

Userlevel 2
Badge +9

Hi Ramkumar,

Besides 2x disks for boot and other system applications, there are 11 vmdks currently attached to the primary linux file server in question.

No, we are not using any RDMs anymore, solely VMDKs presented to the VM as hard disks through vcenter.

Badge +3

You create a different Storage controller( VMware Paravirtual ) per disk, or all disk attached to one ?

Reply