Best practices for a large linux file server besides File Agent?

  • 14 March 2023
  • 7 replies

Userlevel 2
Badge +9

Our linux file server is a large VMware VM - 48GB RAM, 8 virtual CPUs that are running on a Intel(R) Xeon(R) Gold 6248R CPU @ 3.00GHz - and it has about 500TB of storage on it, that our linux admin has set up as 12x 60TB (max size) volumes on our SAN, that communicate back to VMware over iSCSI.

We get reasonable performance with our linux users using this linux VM for day to day usage, but certain customers and projects are reaching crazy levels of unstructured file storage. We have one customer that has a folder that consumes 16TB of data, across 41 million files. And while that’s our worst offender, the top 5 projects are all pretty similar.

We’ve been using the Linux File Agent installed into this VM since starting to use Commvault in 2018. We typically see about 3 hours for an incremental backup to run across this file server, with the majority of time just scanning across the entire volume, and the backup phase running relatively quickly. We run 3x incremental backups per day, at 6am, noon, and 6pm.

However, that aforementioned project is starting to change/touch 2 million files per day. The file agent is starting to take 10+ hours to do a backup, and we’re missing a lot of our incrementals and falling behind.

We are wondering what other organizations do with Commvault when their production file servers start getting this large. In 2018 we were using drive striping, so we couldn’t use Intellisnap, but we’ve since stopped doing that. Is that what would be recommended instead, to move to block level backups instead of file? We did use block level backups on our Windows file server (which was only about 90TB of data, not the 500TB of the linux file server) but we found that when we needed to do a Browse action in commvault, it would take 30-60 minutes just to start the browse. I am wondering if a browse across a 500TB block backup would be even worse.

Any suggestions or options for using other components or functions of Commvault to deal with this huge amount of data on a single file server are appreciated.

I figure that in the grand scheme of things 500TB for a file server can’t be that large, right?

7 replies

Userlevel 2
Badge +7

Hi @ZachHeise 


You can try block level backups which will backup only the changed extents instead of the entire file. It’s a driver based backup, where driver monitors the changes and backs up only specific extents which are modified.


You can find more information here -



Regarding your concern regarding browse taking long time due to mounting of live snapshot, we support inline cataloging with block level, where we will catalog the entire data with block level job itself.



In future release, we would support offline cataloging with block level, which will catalog the data outside of block level job using same / different access node. Browse will be much faster as items would be fetched from index instead of mounting a live snapshot.



Disclaimer : Feature support (offline cataloging) cannot be claimed officially as of now as it is targeted for future releases.




Userlevel 2
Badge +9

Hi Sparsh - block level backups don’t have the ability to exclude/filter folders though, from what I’m seeing at correct? We really need that functionality because we work with PHI restricted data and it would violate our data providers’ TOS to have any backups of the data. Right now with our file agent, we use the exclusion/filter properties extensively.

Thank you for your reply though. Other thoughts about ways we could be improving our linux backups, if block level backups don’t allow filters?

Userlevel 5
Badge +14

You mention it’s the SCAN time which is the problem.

Review the requirements for Optimized Scan to see if that would work for you.  This creates a mini database to track the file system changes.  So you don’t have to run a painful recursive scan across the whole file system every job.


Userlevel 2
Badge +9

Thanks Scott - we’re already using that Optimized Scan option on the linux file server in question, not recursive. I definitely double checked just now though to make sure!

Userlevel 2
Badge +8

Define multiple subclients and name each subclient to describe the content you are backing up. Give each subclient different working hours and run it 3 times according to your request.
Since it is an optimized scan, it will take less time for each subclient to scan.

Userlevel 3
Badge +5

Hello @ZachHeise 

What is the backend storage of 500TB? NetApp, JBOD or Isilon?

For NetApp you can probably use IntelliSnap 

Userlevel 7
Badge +18

As @0ber0n already suggested I would also recommend considering chopping up your config into multiple subclients and/or considering to configure it as a NAS and use multiple proxies/access nodes.

Add a NAS File Server (

AS for the suggestion to use IntelliSnap, you will hit the same challenge for the backup copy as well.