Solved

NFS objectstore performance issues


Userlevel 1
Badge +5

Thanks Bill for detailed documentation. This helped me during setting up NFS for teradata backup purposes.

I see performance for the jobs written on NFS is not same as other jobs using same disk library.

NFS jobs are running with average throughput of 100GB/hr while other agent jobs to same disk library is in TB/hr.

Ran CVperformance check on disk cache and that is also much better (throughput in TB/hr)

 

Anything to check further. I have logged support case but any advice is appreciated.

icon

Best answer by Mike Struening RETIRED 21 January 2022, 23:54

View original

10 replies

Userlevel 2
Badge +4

@shailu89 

There are many variables that can affect performance. Here are some things to consider:

  • Is the performance low when there is a lot of space in the disk cache? If there is enough room in the disk cache (the space is less than the high water mark), then the speed should mainly be dictated by the disk speed. If the disk is filled up, the speed will be dictated mainly by the backup speed.
  • Is the network speed adequate?
  • Are you writing a lot of small files? The performance is tuned for larger files, so will be slower if there are tiny files. You can test to see if there is a difference using large files.
  • The data being written to the disk library is after dedup and compression. The data going over NFS is before. So sometimes highly dedupable data can be much slower going into HFS than the disk library because of this. Measure the disk library using random data to get an accurate measurement. 
  • Is the disk library on a dedicated physical drive? Or is there I/O from other services?
  • There should be stats in the log files. If you upload the logs, we can see if there’s anything that’s obviously wrong, like a slow index server.

Bill

Userlevel 1
Badge +5

Thanks Bill for your reply

I am still having this slowness issue and support is still looking in to this. Case 211001-132

To give you more details regarding points you mentioned

  • I have disk cache with 1 TB space with disk speed tests for cache (cvdiskperf) showing throughput of 3TB/hr. Daily job of 250GB is taking more than 2 hours. By default watermark was 90%, I tried with 20% as well but same result.
  • Use case is with Teradata writing to this objectstore on 10G network interface on both sides.
  • 250GB of data has 25K files as per CV job stats (I think it is counting 10MB extents as file) Actual file count put from Teradata side may be different.
  • As per initial analysis from support, majority of time is in DDB lookup. Same Media agent/SP/DDB is giving 1TB/hr throughput for other workload (sql, WFS etc)
  • Yes, dedicated physical media server for backup.

My question is - Is there a way to increase these default 10MB extent size?

Do you think I should test by using a SP without dedup or disabling deduplication.

 

Also, one point noticed is size of application here is less than data written. So is this something normal with NFS objectstore?

Userlevel 2
Badge +4

@shailu89 

You can increase the 10 MB extent size with the registry value: nObjStoreExtentSizeInMB in the 3Dfs key. The default is 10 and the max is 1024.

Testing without dedup would be a good idea to eliminate that as a variable. Perhaps there is a CPU bottleneck? 

One thing that you can try is to empty the cache and then measure the speed before the cache gets filled. This would show if the problem is NFS ObjectStore itself or the backup speed. Dedupe would come into play during the backup, not while data is being dumped into NFS ObjectStore. To clear the cache, make sure that everything is backed up (you can check the “Uncommitted extents” status value in the Java GUI, make sure it is zero, and then delete the cache subdirectory of wherever you chose the NFS ObjectStore cache to be. We can assist you with that if you’d like.

I think the application size may not be too accurate, so I wouldn’t be concerned with that.

Bill

Userlevel 1
Badge +5

Thanks @Bill Katcher ,

Yes, I appreciate if you can help in identifying where the bottleneck is.

I am unable to find “Uncommitted extents” status to follow the steps.

Userlevel 2
Badge +4

@shailu89 

I’ve sent a message to the support engineer who is handling your case to get the case escalated to engineering so we can provide additional help.

Bill

Userlevel 1
Badge +5

@Bill Katcher  Did you get hold of case? 

Just wanted to stress on importance as this is new use case which may have wide adaptability but only if we are able to fix performance issue and meet backup speed compared to other idata.

 

Customer is looking on other options for teradata backup hence we need to fix this before we loose out on opportunity

Userlevel 2
Badge +4

@shailu89 

I asked the support engineer to escalate the case and arrange a remote session. Once that is done we can have a call.

Bill

Userlevel 7
Badge +23

Sharing case resolution:

The customer has increased the CPU count to 12.
Excluded the nfsd Ganesha process from Antivirus scan.
Now the Object Store backup throughput is improved to 1 TB/hr.

Userlevel 1
Badge +5

Thanks for documenting @Mike Struening 

We also changed NFS mounting on client side from “sync to async” and that boosted performance by 2x.

Also one more best practice was to keep “high water mark” on fileserver as 70% so as nfs cache is not overcommitting.

Userlevel 7
Badge +23

Great additional detail, thanks @shailu89 !!

Reply