Solved

Sharepoint backup is very slow

  • 2 September 2021
  • 25 replies
  • 2009 views

Userlevel 1
Badge +7

Some facts:

Version 11.24.7

Sharepoint 2016 (on-prem)

SP DB backup speeds: 15-25 GB/hr (So not superfast, but atleast it finishes in a reasonable time)

SP Documents backup speeds: 1-1.5 GB/hr (So basicly superslow)

 

Some background information:

I am a storage guy, who’s recently been put in charge of the backup solution, due to people leaving. While i’ve worked some with other backup solutions before, thats 10+ years ago, so my backup skills are a bit rusty compared to my storage skills, so bear with me if i need things spoon fed :-)

Also, the Documents parts of sharepoint have been split up in about 10 different subclients, and this performance issue is affecting all of them.

Also, for other backups being run by Commvault, i see very different backup speeds. For instance full backups of some MSSQL servers can provide speeds of 500-3000 GB/hr, so i know the backend of the backup solution can handle a lot higher speeds than what Sharepoint is giving us.

 

Now, if i try to drill down a bit on one of the sharepoint subclients (one of the smaller ones that complete every day), according to the CVPerfMgr.log (see attachment), it looks to me like Commvault is waiting for the sharepoint server to feed it data, but for some reason it’s not doing that, at least not at an acceptable speed. 

On the backup side i’ve tried to change from 2 to 4 threads/workers with seemingly no effect, and on the sharepoint side we’ve tried/tested turning “Http Throttling” off, and we will test today with enabling a time-windows for Large Queries on the sharepoint side, but no idea if that will help.

Also the sharepoint server does not seem to have a resource issue (high cpu or such) while the backup runs. 

 

So, i guess question is, does anyone have any good ideas on how to speed things up. Having backup jobs that take 4-5 days to complete is not really acceptable, and the reported speeds are insanely slow.

 

 

 

icon

Best answer by Bjorn M 17 September 2021, 14:58

View original

25 replies

Userlevel 7
Badge +23

Thanks for the thread @Bjorn M , and welcome to our community!  this is a perfect place to visit and discuss….we have experts and neophytes and everything in between, all here to learn and grow!

Regarding SharePoint, I suspect that is a limitation of the SharePoint API (some of the APIs are notoriously slow) though I’m going to talk to some internal folks to see what can be done to maximize the throughput for SP documents.

Thanks!

Userlevel 1
Badge +7

Thanks @Mike Struening 

 

Would appreciate some help in getting to the bottom of this. Our attempt at enabling a time-window for “Large Queries” on the sharepoint side had 0 effect. So both Large Queries and Http throttling was a bust, with zero effect.

I did run some Performance Monitoring last night when the backup job was going and at first glance i cannot see anything special.

I can see when the backup job starts, because %Processor Time increases from about 0% to 10-15% for about 20 minutes, and after that it’s pretty much back to 0%…. while the backup job would still be running for a couple of hours….

 Avg. Disk Queue Length goes to about 5 or so around the same time as the CPU increases, otherwise stays around 0.

 

 

Userlevel 3
Badge +7

Hello Bjørn M,

As Michael mentioned, the SharePoint Level backups within Commvault are highly reliant on the underlying Microsoft APIs that are required to extract the information from the Application and the data types, size, and quantity. The recommendations on the additional subclients you already have in place are usually a starting point for that agent. The subclient content for the SharePoint Agent is processed sequentially. Dividing the content up within multiple subclients allows for us to make multiple requests to the Microsoft APIs to backup additional sites/content in parallel but also comes with additional Space Requirements for staging and resource (processing, memory usage) as well so you need to be careful with the amount of subclients you create also.

We can only pull the data as fast as the Microsoft API allows. This is known as the Export phase and is accomplished with a similar API method to the Microsoft Powershell Command below.

https://docs.microsoft.com/en-us/powershell/module/sharepoint-server/export-spweb?view=sharepoint-ps

Microsoft stages the data for us (in side of our Job Results directory location) and once completed, we run a Parsing phase against the data which allows us to provide even further granular restore capabilities. 

Userlevel 1
Badge +7

Thanks @Ron Potts!

Do you have any guidelines for how many subclients is supported/recommended ?

 

Currently, the speeds for a single subclient seems very low, with around 1 GB/hour, but i’ll check with the sharepoint guys if they can test using the export-spweb command. 

Could you provide the exact command with parameters that Commvault uses to export sites, to get as exact comparison as possible ?

This way we could hopefully eliminate commvault as the source of the performance issues.

 

 

Userlevel 1
Badge +7

@Mike Struening 

Did you get any info/tips/tricks from the internal folks (or perhaps that was Ron Potts?)

Userlevel 3
Badge +7

Thanks @Ron Potts!

Do you have any guidelines for how many subclients is supported/recommended ?

 

Currently, the speeds for a single subclient seems very low, with around 1 GB/hour, but i’ll check with the sharepoint guys if they can test using the export-spweb command. 

Could you provide the exact command with parameters that Commvault uses to export sites, to get as exact comparison as possible ?

This way we could hopefully eliminate commvault as the source of the performance issues.

 

 

Hello Bjørn,

There is no exact / blanket recommendation that Commvault would be able to provide for subclient creation since a lot of the configuration would be data specific and resource dependent. One suggestion is to attempt to place the largest sites into their own individual subclient. This allows them to run as their own stream/entity. You can group smaller sites together into one more more subclients. 

Every subclient you create however will require more CPU, Memory and disk space (Job Results Directory is used for caching by the MSFT API). For jobs running simultaneously additional disk space can add up quickly. 

 

Example of the Command:
Export-SPWeb -identity http://SITENAMEHERE/SITE/SUBSITE -path \\SERVERNAME\CVExportTst\export -NoFileCompression -IncludeVersions 4 -IncludeUserSecurity -UseSQLSnapshot


 

Userlevel 1
Badge +7

Thanks @Ron Potts!

 

In other words, try different configurations and see what works best in our environment :-)

 

This will be “interesting” to sort out, considering we currently run 9 subclients and they have the following completion times on average:

~14 hours (about 6GB/hour)

~ 40 hours (about 1.5GB/hour)

~3 hours (about 1.3GB/hour)

~150 hours (about 1.3 GB/hour)

~15 hours (about 0.5GB/hour)

~5 hours (about 5GB/hour)

~8 hours (about 1.1GB/hour)

~1 hour (about 0.50GB/hour)

~9 hours (about 1.3GB/hour)

 

I guess the two subclients with >24h duration is a good place to start….

Userlevel 1
Badge +7

Talked a bit with our Sharepoint admins, and it seems they normally use “backup-spsite” command and not export-spweb, and while using that they have good performance. They are currently testing running export-spweb manually, and noticed that it seems very slow, and also noticed that it seems to create alot of about 25MB files, instead of fewer but larger files, which i assume can affect performance a little bit atleast.

It seems you could use the “-CompressionSize” option to change this filesize, but are there any way to control what filesize Commvault uses  (assuming it pretty much runs/calls the export-spweb command with some parameters)

 

Also, i see there is a additional settings called “dwNumThreads”…. any experience in changing this from the default value of 3 ?  Advantages/disadvantages ?

Userlevel 7
Badge +23

@Mike Struening

Did you get any info/tips/tricks from the internal folks (or perhaps that was Ron Potts?)

It was @Ron Potts :laughing:  He’s a SharePoint guru!

Userlevel 1
Badge +7

Have moved the largest sites from some subclients to their own subclients, going from 9 to 23 subclients.

So will be interesting to see how that affects the time taken to do the backups. 

 

@Ron Potts 

Also, i see there is an additional settings called “dwNumThreads”…. Do you/Commvault have any recommendations/experience with changing this from the default value of 3 ?  Does it have any effect, and are there any Advantages/disadvantages i should be aware of before i possibly try changing it ?

Userlevel 3
Badge +7

Also, i see there is an additional settings called “dwNumThreads”…. Do you/Commvault have any recommendations/experience with changing this from the default value of 3 ?  Does it have any effect, and are there any Advantages/disadvantages i should be aware of before i possibly try changing it ?

 

When it comes to additional setting for SharePoint, the Farm Level and Document level share the same relative path in the Commvault section of the registry which is MSSharepointDocAgent. The dwNumThreads additional setting however is specific to Farm level backups. 

 

“...and it seems they normally use “backup-spsite” command and not export-spweb, and while using that they have good performance. They are currently testing running export-spweb manually, and noticed that it seems very slow, and also noticed that it seems to create alot of about 25MB files, instead of fewer but larger files, which i assume can affect performance a little bit atleast.”

 

Backup-SPSITE is a different Microsoft Powershell command that simulates a Site Collection level backup. This is a different level of backup all together in where Microsoft provides only a backup of the entire Site Collection without any granularity for restore of individual items, sites, etc. 

The Document Level backups leverage the Export-SPWeb command from Microsoft as this export level API is their more granular approach which provides the Site/Item level recovery options. 

 

 

Userlevel 1
Badge +7

@Ron Potts 

Quick question. I’ve deployed the sharepoint agent on the application server in a sharepoint farm (which i believe should support document backup). Everything looks the same as on the client that’s running the frontend (which works), but if i try to browse the content (to for example create a new subclient, or browse the content on the default subclient) i just get a popup with Error “Browse request failed”.

 

Any idea/suggestion as to what could cause this, and where to look to fix this ?

Userlevel 3
Badge +7

Hello @Bjorn M - Are you able to test running a Check Readiness to confirm if there are any communication issues between the CommServe and the Client? Also, is the File System level agent deployed and are you able to browse subclient content for it? The reason I ask is it may help narrow down if SharePoint is failing to enumerate sites on the SharePoint Server OR (if the File System level browse fails also) it would point more to a general communication issue to the SharePoint Server to complete subclient browse requests in general.

 

Installing on the SharePoint Application Server should be OK, but, this depends on your environment. Often, customizations (templates, forms, etc) are published within SharePoint and components for these are referenced by SharePoint though they live on the File System of the SharePoint Server. Often SharePoint Admins will only publish said customizations to the Front End Web Servers as they are the ones hosting content for the end users. If you are running backups for the SharePoint Application server and those same customizations aren’t deployed there, its possible you may see some errors during backups come back from Microsoft. 

Userlevel 1
Badge +7

@Ron Potts

I can browse the contents of the file system without any issues.

 

The “Check Readiness” does however report what is the issue i guess….

Client = Ready

Sharepoint = Not Ready (User Account Information not found)

 

Going into “Properties” for the sharepoint server under the client i see that the “Service account” is empty...so i’ll try to fix that, and let you know if that solves the issue.

 

Update:

Running Validate now returns success on everything except on “SQL Server Services Account”, which fails. 

I’ll doublecheck & verify that the username and rights are correct and in accordance with Preinstallation Checklist for the SharePoint Server Agent (commvault.com)

Userlevel 1
Badge +7

@Ron Potts

Status:

Farm 1: Frontend-1:

Validate account (in sharepoint properties) returns Success for “Local Admin”, “Farm Admin”, “SQL Permissions”, “Access to Simpana Registry/key”, “Simpana Services Account”, “Sharepoint Timer Service Account” and “Web application pool accounts”. 

If returns “Failed” for “SQL Server Services Account”. Seems there are missing rights to the job results and/or log files directory. Will fix those.

A regular “Check readyness” on the client returns

Client = Ready

Sharepoint = Not Ready (SQL Services Account: Not Ready)

 

Farm 1: App-server,

validate account (in sharepoint properties) does not complete… it just runs and runs, but has not finished or failed in the 30 minutes or so i’ve waited for it….. So something is obviously not right there.

A regular “Check Readiness” on the client returns

Client Ready

Sharepoint = Not Ready (Cannot launch Compatability check tool)

 

Another thing i noticed, is for Sharepoint it’s listed that the Job Results directory should not be on the C:\ drive. (which it currently is).

Any experience/numbers on how much this could affect performance ? 

The underlying storage for C:\ is SSD based, so should be pretty fast.

 

 

 

Userlevel 1
Badge +7

Think i found the problem in the end, and the check readiness reports success.

 

Now the interesting thing is that i’m testing 2 subclients, with exactly the same settings and same content, where 1 is on a front-end server, and the other is on an app-server. Both in the same sharepoint farm.

Still testing, but so far it seems the backup vs the app-server is quite alot faster, about 3x the speed.

However, the backup vs the app server also has alot more failed files and folders compared to the subclient running towards the front-end.

 

Is this “normal” or expected behaviour ?

@Ron Potts 

 

 

Userlevel 3
Badge +7

Another thing i noticed, is for Sharepoint it’s listed that the Job Results directory should not be on the C:\ drive. (which it currently is).

Any experience/numbers on how much this could affect performance ? 

The underlying storage for C:\ is SSD based, so should be pretty fast

 

The reason to avoid the C: drive would be. during backups depending on the size of your environment a lot of data could be passed through the Job Results directory. With the OS and possible the SharePoint application installed on the C: drive, if the drive were to fill up, it could cause issues.

 

Bjørn M wrote:
However, the backup vs the app server also has alot more failed files and folders compared to the subclient running towards the front-end.

Is this “normal” or expected behaviour ?

 

That could very well be the symptoms I mentioned in an earlier post:

Installing on the SharePoint Application Server should be OK, but, this depends on your environment. Often, customizations (templates, forms, etc) are published within SharePoint and components for these are referenced by SharePoint though they live on the File System of the SharePoint Server. Often SharePoint Admins will only publish said customizations to the Front End Web Servers as they are the ones hosting content for the end users. If you are running backups for the SharePoint Application server and those same customizations aren’t deployed there, its possible you may see some errors during backups come back from Microsoft. 

Userlevel 1
Badge +7

@Ron Potts

Thanks, i’ll check with the Sharepoint admins if that could be the issue.

 

I did run the same job vs the app-server twice for testing purposes, and it was not the same files that failed on both jobs. It seemed kind of “random”. But i’ll compare the failed files and see how much overlap there is.

 

Userlevel 1
Badge +7

@Ron Potts 

I’ve compared the failed files & directories between the same full backup job (vs the app server) that ran with 30 minutes or so between them.

There is not a single overlap in folder or files that failed, so seems completely random. I guess this makes it unlikely to be the possible issue you described above with cusomizations not deployed everywhere.

 

Userlevel 3
Badge +7

Can you share an example of the Failed Files and some of the errors returned on them?

Userlevel 1
Badge +7

Can you share an example of the Failed Files and some of the errors returned on them?

@Ron Potts 

 

With regards to error message, they all just say “FAILED”. Nothing else (when viewing the failed files)

Most seems to be of the following type:

http://somesite/somesubsite/someuser/_catalogs/ then with the following subfolders that are most common

 

theme/15/PaletteXXX.spcolor or .spfont files

wp/Various .webpart and .dwp files

/design/various .000 files

/masterpage/Display Templates/System/various .js files.

Userlevel 1
Badge +7

@Ron Potts

Looking in the actual log-files for the job reveals the following (sanetized somewhat)

 

17904 1438  09/17 11:38:16 1186085 CVSPBackup::BackupItem() - Error encountered for object \MB\http:somesite\subsite username\\_catalogs\theme\15\Palette015.spcolor\1.0
17904 15    09/17 11:38:16 1186085 CVSPExportReader OpenNextFile - Exception opening next export file - System.IO.IOException: The process cannot access the file '\\Sharepoint-App-Server\1811\Some-ID-String\_catalogs\theme\15\Palette020.spcolor\Manifest.xml' because it is being used by another process.
at System.IO.__Error.WinIOError(Int32 errorCode, String maybeFullPath)
at System.IO.FileStream.Init(String path, FileMode mode, FileAccess access, Int32 rights, Boolean useRights, FileShare share, Int32 bufferSize, FileOptions options, SECURITY_ATTRIBUTES secAttrs, String msgPath, Boolean bFromProxy, Boolean useLongPath, Boolean checkHost)
at System.IO.FileStream..ctor(String path, FileMode mode, FileAccess access, FileShare share)
at System.IO.File.Open(String path, FileMode mode, FileAccess access)
at CVSPDocServerNamespace.CVSPExportReader.OpenNextFile(Boolean& error)

Userlevel 3
Badge +7

Does that SharePoint Application Server have AV installed? If so, can you exclude the SharePoint Job Results directory? Seems like something is holding onto that Manifest.xml file during the backup operation at the OS level.

Userlevel 1
Badge +7

Does that SharePoint Application Server have AV installed? If so, can you exclude the SharePoint Job Results directory? Seems like something is holding onto that Manifest.xml file during the backup operation at the OS level.

I’ll see if i can verify that it’s turned off on the app-server for the job results directory.

Userlevel 1
Badge +7

@Ron Potts

Your hunch was correct. Jobresults directory was not excluded from antivirus (thought CVSPBackup2013.exe process, and the  C:\Users\Commvault Services account\AppData\Local\Temp was excluded).

 

Ran a new test, and it completed fine. Moving the job from the front-end to the app-server also made it alot faster. ~16 minutes vs 1+ hour.

 

Guess i’ll be busy reconfiguring alot of sharepoint backups next week :-)

 

Reply