In the documentation, it is written “
The DDBs created for Windows MediaAgent should be formatted at 32 KB block size to reduce the impact of NTFS fragmentation over a time period. The DDBs created for LINUX MediaAgent should be formatted at 4 KB block size. “
- Why the default deduplication block size is 128 KB as it is neither optimal for Windows & Linux Media Agent? and in addition, not optimal for Cloud library ….
- I cannot find in the documentation where to setup the block size of the DDB. It is in he storage pool properties but can someone provide me the link in Commvault?
Good morning. Block sizes can be modified from the Data Path Properties dialog box available from the Data Paths tab of the Copy Properties dialog box, for the specific data path.
You can configure block size from the Storage Policy Properties - Advanced tab. When configuring the global deduplication policy, all other storage policy copies that are associated with the global deduplication policy must use the same block size. To modify the block size of global deduplication policy, see Modifying Global Deduplication Policy Settings for instructions.
The 128 KB is a good performing value in general for on-premise backup workloads, for cloud you might consider a higer setting > 512.
The reason that you do not want such high block size on filesystem level in a regular situation, but use 32 KB (win) and 4 KB (Linux) is to mainly provide a balance between proper space usage and read/write performance. It all depends on the volume size and the type of data being allocated on the volume.
The setting is in the advanced tab for the Storage Policy or Storage Pool:
Thanks for your answer.
Could you please elaborate a bit when you write “The reason that you do not want such high block size on filesystem level in a regular situation, but use 32 KB (win) and 4 KB (Linux) is to mainly provide a balance between proper space usage and read/write performance. It all depends on the volume size and the type of data being allocated on the volume.”
I mean can you please provide a concrete example?
For large disks you might feel the need for a larger block size to boost performance, but looking at the DDB files, there are also a lot of small files. If you put the small files, for example 16KB, in a block of 128KB then you have lost 128-16 = 112KB space. This doesn’t seem much, but when the amount of files get larger it will be noticable in your free space.
When you have larger files this would be less of an issue, then you only have a possible loss at the end of the file. Unless hole drilling is used, but that's a whole different story which I won't go into right now. When a file is bigger, for example 9,6MB, you have 9830,4KB. This results in 9830,4/128 = 76,8 blocks so actually 77 blocks. This is a los of space of 0,2 block = 128*0.2 = 25,6KB
As you can see there is quite the difference in waist of space for a single file.
Additionally there are concerns regarding the fragementation or amount of file extents, the more fragmentation or file extents needed the more I/O is needed and thus lowering the maximum performance.
There is a lot of details to cover in order to fully grasp the concept, too much for me to explain in this thread. But hopefully these articles will give you some insight:
A contemporary investigation of NTFS file fragmentation - ScienceDirect
Everything You Need to Know about SSDs and Fragmentation in 5 Minutes - Condusiv - The Diskeeper Company
Hope this helps.
Many thanks for your answer. Really appreciated.