Solved

Can we Customize the object count or Chunk size of each object that will get uploaded to cloud

  • 14 June 2023
  • 9 replies
  • 219 views

Badge +5

We require 1 million objects to be ingested from MS SQL into our cloud storage via Commvault. We created a table with 5 million rows of data in MS SQL but when we perform full backup from Commvault we could only see 20 objects being ingested. Could you suggest us any way how we can simulate 1million objects 

icon

Best answer by Damian Andre 15 June 2023, 03:27

View original

9 replies

Userlevel 6
Badge +15

When you refer to “20 objects being ingested” are you referring to you see 20 chunks of data written to storage?

 

Badge +5

Hi @Orazan the total object count we see on our cloud storage is 20

Userlevel 6
Badge +15

We write chunk files to storage.  There is no expectation of the number of rows in a database and the number of chunks of data written to be anywhere near the same.  I guess I am not understanding what the issue is.   Can you please provide some clarity?

Badge +5

@Orazan We wanted to ingest million objects from SQL server to Cloud storage from Commvault. We want the object count on cloud storage to be one million or 1 million chunks of data. 

  1. Can we customize chunk size in Commvault i.e, currently we see chunk size in MB can we resize it to KB’s so that object/chunk count increases.
  2. Else is there any way that we can set data on SQL server that will help us achieve this 
Userlevel 6
Badge +15

Are you looking to just get a certain number of objects into the cloud of any data type or does it have to be SQL data?

Userlevel 7
Badge +23

With deduplicated data, the max object size in cloud libraries is 8 MB, non-dedupe its 32 MB. This is not end-user configurable

So you’d need 8 TB of non-dedupable data to get 1 million objects created by Commvault - the type of data or source of the data doesn't impact how many objects are written and what size

Badge +5

Are you looking to just get a certain number of objects into the cloud of any data type or does it have to be SQL data?

Hey Orazan,

we are aiming to get 100,000k objects into the cloud basically out of a SQL server ,Oracle DB or a VM backup.

Could you please suggest us something here?

We tried modifying block size, maximum transfer size, buffer count but getting same number of objects as with the defaults.

Badge +5

Hi @Damian Andre  Thank you for the clarity I have a follow up question on same topic. Could you please explain me the  difference between Object size, Chunk size and block size. Are the configurable ?

According to below documentation from Commvault Default chunk size for cloud backup is 4GB 

 

Userlevel 7
Badge +23

Hi @Damian Andre  Thank you for the clarity I have a follow up question on same topic. Could you please explain me the  difference between Object size, Chunk size and block size. Are the configurable ?

According to below documentation from Commvault Default chunk size for cloud backup is 4GB 

 

Hi Lahri,

Block size is the amount of data used for the de-duplication process. This is set on the dedupe store level and the default is 128k for most storage. This means we read 128K worth of data at a time and send that to the deduplication database to be signatured - each next 128K read is signatured and compared to the other signatures in the database to determine if it is duplicate data. Higher block sizes can improve performance as there is less overhead (back and forth for each block), especially in high latency situations and cloud storage; but there is slightly less chance of a duplicate object match if the amount of data is bigger. Of course in reality it is more complex than this and there are many behind the scenes optimizations, but that is generally how it works.

Chunk size does not really matter too much when deduplication is in play, its more of a under-the-hood logical container that commvault uses to group blocks of data. In the tape and non-dedupe days it would be the maximum size of the file written before moving on to the next file. In some cases, an entire chunk has to be read to extract a block of data - but as mentioned, not really a big deal today with modern storage and deduplication. That’s not to say it wont impact some operations (like suspending an resuming an auxcopy will cause it to start from the last chunk - bigger chunk = more data discarded theoretically).

Object size only applies to cloud/object storage and its a level below the chunk size. This is the maximum size of an option we write at once. This is significant for data aging - we can only delete an object if all the deduplicated blocks stored in it are expired (i.e not being used by any job). The bigger the object size the less frequent that can happen and the more space you end up consuming. i.e if one deduplicated block is in use at 128kb, but I can’t delete the 8MB object until that last block is no longer needed. The smaller the object size the more space efficient in theory, but less performance as there is more I/O overhead fetching more objects. For the most part Commvault is tuned out of the box for the best performance / space efficiency ratio. It’s very rare that the default settings need to be modified to yield better results.

For example, the deduplication block size is automatically increased for cloud storage to get better performance as latency to cloud libraries is higher.

I hope that explanation helps to clarify things somewhat.

Reply