we are looking into our backup strategy and investigating few scenarios of backend storage. Right now we are considering object storage via S3 protocol and file storage (JBOD) over NFS protocol. The data which will be send there is on-prem filesystems, databases, VM’s etc. Total capacity - over 10 PB.
Currently we tested some object storage over S3 protocol but we faced issues with data reclamation (garbage collection for expired objects taking way too long and capacity reclaimed wait time takes over month or few).
Can you share your experience with back-end storage, what challenges you faced or how you solved my mentioned issues, also, what advantages you see comparing S3 & NFS protocols for backups.
All feedback is very appreciated.
Best answer by Onno van den BergView original
You are correct that S3 is less efficient when it comes to data aging. The reason is that we don't have the luxury sparse as we do on object/network storage (assuming the underlying storage supports it). S3 is written in 64MB blocks (from memory), and each signature needs to be unreferenced before the object can be removed. So one signature could hold up the block - you can optimize that in some ways, but that is the fundamentals.
Likewise, S3 tends to be a little bit slower when reading or recovering, simply because it is more latent than traditional protocols and is more chatty. We made HUGE strides in improving that performance with clever caching and batching, but at some point, you hit the limit of physics ;)
On the other hand, S3 can be much easier to manage - it is more abstracted than NFS and typically allows easier expansion if that is something you will be doing often.
So my vote would be to go with direct NFS from a pure technical efficiency and performance standpoint, but that is also considering you need to optimize for those two things. There are other attributes S3 could bring to your business that could outweigh those advantages.
S3 could also provide a psuedo virtual air gap solution, in that you are eliminating persistent connections to the storage device offering some layer of security against threats such as ransomware. Additionally many of the S3 storage devices have native object lock capabilities, offering data immutability. Be aware this could have significant cost add.
That said you can take advantage our software locks for windows and linux data movers to lock nfs mounted storage, and build a hardened solution as well without the performance drawbacks of S3 storage. https://documentation.commvault.com/11.21/expert/9398_protecting_mount_paths_from_ransomware_01.html
It will fall down to requirements - in some cases the org wants to use a mix and match of storage mediums to follow strict 3-2-1 principles. In other cases due to backup and recovery objectives the S3 performance impact may not fit the need.
I would say it depends ;-) Are you referring to a local solution offering S3-compatible object storage, or are you referring to Amazon S3. In case of a local solution are you running it dispersed across multiple sites? What solution are we talking about in that case; StorageGRID/Cloudian?
What we have seen ourselves is that configuring the option "do not deduplicate against object older than X days” when using S3(-compatible object storage) definitely helps in improving the storage efficiency.
To come back to some points:
S3 performance can be massive, but it al depends on the amount of streams that you can throw at it. If you run it locally than it really depends on the infrastructure, especially the amount of nodes and the network setup and performance that it can deliver. Of course the MAs also will have to be able to push all bandwidth but we have seen good performance. We do not use disk libraries at all anymore because:
There was an option (can't find it anymore) that you could turn on, that would use a disk library to cache the metadata of deduplicated data. That would already help improving performance as it reduces the fast amount if small gets, but it seems it was removed from the product. I played with it in the past but we use partitioned DDBs in the cloud in where we do not have shared storage to facilitate te cache across all involved MAs and you could only define one library so that was not possible in our case or we have to implement all kinds of solution which would take away the benefit of the cache so for us it was not a valid option. I was more looking for an option that would just use a "folder” on the MA that you could designate as a cache device and Commvault would than pre-warm the cache by fetching all related metadata to the local cache and by keeping it up-to-date at all times. That would remove the disk library and it would take away the need to have shared storage to really benefit from it. Maybe it comes back in the future, would be nice as I think cloud storage will become the number one used storage at some point in time.
Moved the new question to its own thread: