Replacing Media Agent Hardware

  • 6 December 2021
  • 7 replies
  • 977 views

Userlevel 1
Badge +5

Our Media Agent needs replacing.

The replacement is probably going to be a Dell PowerEdge R5xx or R7xx with dual CPU and 128GB RAM and the intention is to use M.2/NVMe for the OS/Commvault binaries and DDB/Indexes but I’d appreciate any guidance and best practise on the build and storage.

We’re doing a lot of synthetic fulls with small nightly incremental backups and we aux about 70TB to LTO8 tape every week.

The disk library we have right now is approx 50TB.

Is there any best practise that would favour NAS/network storage for the disk library over filling the PowerEdge with large SAS disks?

With local disk on Windows Server is there a preference between NTFS and ReFS and is there a best practise over mount path size as we’re currently using 4-5TB mount paths carved out as separate Windows volumes on a single underlying hardware RAID virtual disk.

Given modern hardware performance can anyone see a definite reason to do any more than buy a single PowerEdge for this other than redundancy/availability of backups?

I’m trying to balance robust with simple :grinning:


7 replies

Userlevel 7
Badge +23

Great post, @Paul Hutchings !  I converted it to a Conversation to encourage some member responses since it’s less an ‘answer’ you need and more ‘advice’.

To get things started, I did want to include the Dedupe Building block guide:

https://documentation.commvault.com/11.25/expert/12411_deduplication_building_block_guide.html

The IOPS test:

https://documentation.commvault.com/11.25/expert/8825_testing_iops_of_deduplication_database_disk_on_windows_01.html

as well as the Media Agent sizing thread we have in our community:

I’ll keep an eye on the replies we see.

I bet @Marco Lachance , @Laurent , and @dude would have some valuable thoughts!

Userlevel 1
Badge +5

Mike thanks, at our size I don’t think there’s any concern that the OS on M.2 drives and the DDB/Index on NVMe will be a bottleneck.

That leaves the disk library storage and I guess there is no substitute for spindles but on our existing MA which is also 12 drives I can regularly see 300MB/sec being pulled during an aux using Performance Monitor and around 500-600GB/hour being aux copied.

Basically the aux is the only thing that mildly concerns me.

Can anyone share their disk library config and throughput if they’re running something similar?

Userlevel 6
Badge +15

Hi @Paul Hutchings and @Mike Struening !

Sorry, I’m quite busy those days, as I’ve been preparing my FR21 to FR24 upgrade that is planned this thursday 9 :wink:

 

Paul, from my reading, you will use Windows instead of Linux OS.

I won’t discuss about ransomware protection between both OSes, but just about the technical difficulties that using windows MA instead of linux withdraws.. No need to mess with free inodes in volumes or things like this for volumes/filesystems hosting the DDB (yeah DDB Backup is required so it’s a key point).

 

Are you going to use deduplication ?

Because, if so, I guess the ‘poor’ performance of spindle disks in any RAID configuration to a single volume is not a real bottleneck. Unless you expect to perform full restores in parallel, as data would be rehydrated on the fly, and the more streams you would use, the less disk performance your RAID would be able to provide.

Also, just a technical note : try to dedicate a cached controller to this RAID volume, instead of the same controller that would be used by your M2 drives or SSDs or NVMe (if not PCIe cards) : this could affect global performance.

Today, I do not have a configuration where I backup ‘everything in a go’ during the night and later ‘copy everything to tape’. I have multiple backups, and auxcopy schedules performing every 30 or 60 minutes. 

And global performance of such MA would depend on the amount of streams you would allow for backups and auxcopies.. 

So, hard for me to provide a Performance monitor sample screenshot of such configuration (and most of my new MAs are beeing Linux instead of Windows).

Now I tried to run a Storage Validation of like this on a windows server of almost the same hardware : 

 

 

performance data during that : 

aroung 400M/s writes, just started reading
the read at 50M/s

all with 4 threads.. 4x50=200M/s and surely more if only 1 thread.

Storage validation results :

 

And to try to push the disks to their limit, I performed a full data verification of all jobs held on the disks, no stream limit (so 100 for this MA in my case).

Here’s a screenshot : 

 

This killed this MA performance :stuck_out_tongue_winking_eye:

But this is somehow what could happen if I was asked to restore all the data backup by this MA, as some kind of DR.. 

 

This is acceptable for me, and my internal customers.

Of course, but I’m sure you’re already aware of it, if they want the restore time to be smaller, then it would cost more : faster disks/arrays, Gridstor, more interfaces, more independent arrays…

 

Hope this helps you.

Userlevel 6
Badge +15

Oh, important to mention, this disk array hosting the library has only 1 volume, fully dedicated to the single mount path I configured on this MA. 

If you configure 2 volumes, then better have 2 RAID volumes, making sure the spindle disks are not spanned for both volumes… 

This case, I have it configured in a huge array, and trust me the performance is poor.

I plan to get rid of it, get a new blank one, and not hash (or slash!) it all this way as it was done by the vendor when I was a Commvault novice.. :sweat_smile:  

Userlevel 1
Badge +5

@Laurent thanks so much for that :grinning:

The issue I have is I don’t have the hardware and it’s too late benchmarking it when it’s been purchased.

From those numbers above “fast enough” looks the case but we are using global DDB and I’m guessing from the throughput above that isn’t using dedupe it’s a regular read/write using storage validation?

When you say “almost the same hardware” do you know the disk layout and RAID level and RAID controller in the server you were using please?

I’m fairly sure on a Dell PowerEdge that a BOSS card and NVMe don’t touch the PERC.

Userlevel 6
Badge +15

hi @Paul Hutchings 

Yes, you’re right regarding the Dell configuration.

Here’s an example of one of mine : 

BOSS is a RAID1 of 220GB, used to store the OS, and basic Commvault MA binaries.

On the Dell HBA in slot3, 2 SSDs of 1.7TB are combined in RAID1, and I split them in 2 volumes : one that will be used to store the DDB, the other to store the Indexcache.

On the PERC H740P, a RAID6 of 7x7.5TB spindle disks including 1 hotspare, as only one RAID volume, hosting one partition/logical volume, that will be used to store the mount path of the MA (storing dedup backups). 

 

Regarding the figures provided, the full data verification I ran is a good example of IOps stressing, as it’s accessing the DDB (reads, yes) and checking blocks on the MP (reads also).

 

Keep in mind that, for backups, MP Storage is not the bottleneck if you use deduplication, but more the DDB storage, as all blocks/signatures are beeing always compared. Only the new blocks are written to the MP.

As also previously mentionned, MP storage becomes the bottleneck with deduplication, when you have to restore data, as data you would need to restore would probably require blocks stored all around the MP volume wide, and with spindle disks we all know random IOPs are not their advantage..

 

Userlevel 1
Badge +5

@Laurent thanks that’s pretty encouraging if that server only has 6 spindles (effectively because of hot-spar) and it’s doing that kind of throughput.

My single server build would have 12 spindles and I hope the new PERC cards like the H750/H755 are very good.

I don’t see I can do much more with a single 2U R550/750 :grinning:

Reply