Interesting discussion - and I am not going to put any vendor or array name into this post because I believe that for backup storage on disk you usually don’t need any features that go beyond “standard stuff” that all vendors of a given class of storage should be able to provide - and performance in the end boils down to actually used physical devices and the Raid / Pooling etc. technology on top.
So when it comes to choosing a disk based storage solution for Commvault Backup there are a few questions one should ask themselves:
- Do I need automated failover for Media Agents on backup as well as restore (Commvault GridStor)
- This requires all involved Media Agents to be able to read and write to the same storage library at the same time. This can mainly be achieved using NAS or in some cases clustered/scale out filesystems (e.g. Hyperscale)
- Do I want to use partitioned deduplication in Commvault
- For this at least all Media Agents need to be able to read all the data - write could be to dedicated mount paths per MA. Data Server IP with Read Only sharing could be a solution here if DAS/SAN is used for storage. Keep in mind that if the sharing MA is offline then you won’t be able to guarantee restores/copies - backups will continue to run.
- Do I need to do regular “full” copies (e.g. tape-out) or complete data recoveries
- This will require to read all “logical” data blocks - meaning it will be quite random (dedupe references could be all over the library) and it will read the same data many times (references could be referenced multiple times)
What I have encountered in the past is that especially the last bullet point has not always been thought of, when sizing backend storage for Commvault. The classic way of thinking has been “it’s backup, so let’s just put few really really large drives into RAID 6 and we should be fine” which is definitely true for a non-deduplicated storage pool. Once deduplication comes into play, the randomness of data is growing on reads - writes will still be sequential as they will only write new data plus some references and metadata - so it’s always “additive”.
Commvault provides a tool that lets you test disk performance of a disk library to find out if it is sufficient for your needs: https://documentation.commvault.com/commvault/v11/article?p=8855.htm
My personal preference for non-Hyperscale backup storage is high performance NAS as this will give you the sharing of the library across many Media Agents read and write so you can do failovers, AuxCopies using just the target Media Agent, restore through any MA etc. etc.
It could also make sense to take a peek at Commvault Hyperscale technology which will run on many different server hardware platforms - as this is really easy to buy, setup and run and it gives you all the same benefits plus simple scale out.
I read your post @Joel Lovell and immediately smelled that you have to be working for Nutanix. Sorry, to say, but I really do not like to see this kind of advertisements here. Please use Linkedin, and if you really want to make a statement, than please come prepared and give us some proof showing why a v7000 cannot compete against Nutanix Objects. Do not forget to include a budgetary quote for the setup ;-)
Thanks!
Not an endorsement for any particular vendor, but my personal observation is that NetApp are usually great performers. Isilon can be too, but often they push the archive class nodes which are cost-effective and backup fast OR copy fast, but trying to do both at the same time overwhelms the cache and performance suffers.
I have heard good things about infinidat as well - fast storage and great UI.
We have deiced to migrate the primary landing zone for backup data to a Pure Storage FlashBlade/S array located in one of our DCs. To make sure we always have a copy available we will create a near synchronous copy to our large archive tier which writes the data using replicated EC across 3 DCs. Really looking forward to be able to deliver a fast recovery tier to our customers! We'll definitely share our experiences with you all!
The V5000 arrays are nice, cheap, and deep - but not even the newer generation V7000’s can compete with how Nutanix Objects scales in performance and parallel workloads, and so makes for an outstanding Commvault certified solution. It is super easy to setup and use. You can run a media agent/vsa vm on each node and / or use physical media agent servers;
https://www.nutanix.com/viewer?type=pdf&path=/content/dam/nutanix/resources/solution-briefs/sb-buckets.pdf
https://www.nutanix.com/viewer?type=pdf&lpurl=optimizing-commvault-with-nutanix-and-vsphere.html
Regards,
Joel
Interesting discussion to follow for sure! Still read quite a few people who are talking about block storage arrays which is really interesting to me, because the choice for block storage also introduces a challenge if you want your MAs to be HA leveraging partitioned DDBs, just like @Christian Kubik mentions.
My personal order would be to pick something that can deliver:
1) Cloud storage
2) File
3) Block
If I had to design something new and the customer is not leveraging on-premise cloud storage than I would definitely try to convince the customer to purchase cloud based storage like Cloudian/MinIO/StorageGRID/Flashblade. It offers many advantages like to be less sensitive for ransomware attacks, WORM, the ability to have copies in multiple locations without need to auxcopy and an additional selling point would be that it can be used to drive software/business development.
As for file storage you can look at the big names NetApp/Dell EMC or just have a look at he portofolio of Tintri. We have good experience with their IntelliFlash arrays. Very interesting from a price point of view and they offer multi-protocol capabilities.
We moved from Hitachi midrange (HUS150/HUS130) to NetApp AFF A300. They are fantastic arrays in every sense. Worth every penny!
Hi David,
Thanks for the feedback. I love Hitachi arrays but I’ll keep this info in my pocket for the future. : )
Pure Storage - FlashBlade (NFS) for Rapid recovery
I really like the HPE StoreOnce solution, it hit my price point really well, and Commvault can talk natively to the HPE solution. That means I can offload all my de-dupe to HPE, leaving Commvault focused on just data protection. Across all my datasets, I’m getting about 7:1 dedupe ratio, with some stuff getting as high as 11:1.
Note, I’m in the Medium business market, so there are faster solutions out there, but this hit a perfect price vs performance balance for me.
We looked at HPE StoreOnce too but wanted to let Commvault handle the de-dupe. We’re getting 10:1 with our datasets.
We have a customer that has an Isilon for disk storage.
Backup speeds are ok, but DASH to DR or Copy to Tape speeds are terrible. “Last night’s” backups copy to DR at no more than 500GB/hr, if we’re lucky and copy to tape speeds do not exceed 200GB/hr.
Fallen Behind Copies are literally years behind.
Both Commvault and Isilon have checked it out and cannot do anything about it.
We did spec Hyperscale before implementation but we were overruled and now I sit with this issue. Very frustrating.
We have a customer that has an Isilon for disk storage.
Backup speeds are ok, but DASH to DR or Copy to Tape speeds are terrible. “Last night’s” backups copy to DR at no more than 500GB/hr, if we’re lucky and copy to tape speeds do not exceed 200GB/hr.
Fallen Behind Copies are literally years behind.
Both Commvault and Isilon have checked it out and cannot do anything about it.
We did spec Hyperscale before implementation but we were overruled and now I sit with this issue. Very frustrating.
Open another thread in the proper area and we would discuss about such, to avoid beeing off-topic here.
This is the perfect example of ‘intelligent storage’ that delivers performance while writing, but when reading data from outside of the ‘buffer’/’landing zone’ (or whatever they call it), the performance is degraded, as it has to be rehydrated from the array to be provided to Commvault, before it can be processed.
Always try to have only 1 deduplication level : Commvault’s (best) or hardware (worst, as you become hardware dependent), and not 2.
we use Hitachi HNAS storage for backup target so far no issues with performance , throughput
Have used EMC Clarion’s and VNX’s in the past however have moved to Dell EMC Unity All Flash HP Nimble Hybrid storage systems during the last couple of years.
Performance wise the difference is night and day. Very happy with the new storage ecosystem.
Very good question.
I’ll probably surprise some, we now use Purestorage Flashblade array with a 4+1 gridstor, configured in S3 mode, using many 10G ethernet attachment for MAs, and multiple 40G for the array.
This array has been the replacement for an old Netapp FAS2554 10G NAS that became more and more overloaded as time (and stored data) went by. When we had a need to perform huge parrallel restores the FAS2554 was killed. This is not the case anymore with the Flashblade.
Of course, the price is far different, but our ambitions (and budget ) have been reviewed after this performance issue.
This primary was and is still the source for auxcopies to a SAN spindle disks array (less performance/more capacity) to have a secondary copy, and also the same source to a Cloud copy.
Top performance now, while having the previous FAS2554 as primary, auxcopies were delaying and piling up because of the multiple writes (primary) and reads (copy to SAN, copy to Cloud).
Hi everyone,
We backup data from the following storage arrays/servers:
- Hitachi VSP F900
- Hitachi VSP G700
- Hitachi VSP G600
- Hitachi HNAS
We have great performance reading from all the Hitachi arrays, and writing to our maglibs which are on the G700 and G600.
Thanks,
Lavanya
We moved from Hitachi midrange (HUS150/HUS130) to NetApp AFF A300. They are fantastic arrays in every sense. Worth every penny!
Seems like NetApp is popular here! We use use the NetApp E 5600 Series and absolutely love it. Previously used (and still so in other places) Quantum QXS but could not deliver the performance we needed. We’re able to hammer this thing with backups/Aux Copies/DDB Ver jobs at the same time and it keeps humming all day long. Will replace the legacy storage with this going forward. Highly recommended!
Have used EMC Clarion’s and VNX’s in the past however have moved to Dell EMC Unity All Flash HP Nimble Hybrid storage systems during the last couple of years.
Performance wise the difference is night and day. Very happy with the new storage ecosystem.
Hi,
is it attached via SAN or NAS Protocol?
thanks
@Laurent @JamesS can you share your experiences with Flashblade so far? we're strongly considering buying the new Flashblade/S version that was release earlier this month?
Are you leveraging the S3 protocol?
I’m quite happy with Isilon tbh. The performance issues what we are seeing is most of the time DDB related, and not isilon.
We started using NetApp E 5600 Series and absolutely love them. These things rock and are pretty simple to install/manage.
For simple storage usage without any “smarts”, I’ve noticed many customers are choosing Isilon, especially for large scale deployments of PB’s in size.
Other than that, NetApp’s have traditionally offered great balance between performance and capacity.
Lastly, would caution against cheap/low end NAS devices like QNAP or Synology as most of their products have terrible performance when trying to do both reads and writes concurrently as well as having less support / product reliability.
We generally use NetApp e-series generally for short/mid term dedupe storage and StorageGRID for long term.
The alternative is the HPE MSA class array direct attached to the MediaAgent. Something that’s scalable is the key.
@Onno van den Berg Good points, although I have found that is ideal for the primary target to be file for the sparse attribute support and secondary cloud (for all the reasons described).