sharing the latest which seems to have explained everything:
The oldest block retained is from 2017. But we are no longer referencing to those blocks as they are past 32 billion ids, so the primary block distribution will not be uniform. The older blocks from 2017 are retained mostly because of valid jobs still retained.
We don’t have latest DDB Dump. But using a rough estimate we will have blocks written up to Jan 31st 2019 will be referenced up to April end.
So we can start with Jan 31st 2019 as the first step for the below setting and then proceed to April 2019 and then August 2019 after observing space consumption.
I believe the below are the steps to be carried out:
* The two factors together are causing a unique scenario here where we keep hitting a check that causes client cache to go out of sync:
a. References to primary blocks written since 2018
b. Fast rate of recycling of primary blocks.
For now, in order to avoid the issue, we are considering the following steps:
1] We need to prevent references from Aug 2019 to get a window before we hit the problematic check again. To do this we can run the following command:
qoperation execscript -sn DDBParam -si set -si 66 -si DDBDoNotRefBeforeTime -si "2019-08-14 00:51:24.000"
By avoiding these references we may end up requiring up to 17TB space for the blocks to be rewritten based on the dumps we have collected.
On a conservative approach, we can do this in multiple steps by starting with older dates.
2] We also recommend moving out the subclients that are consuming primaries at a fast rate to a different storage pool: