Hi @Shane
Deduplication isn’t quite that simple, comparing one file against another.
Please take a look at Optimize Storage Space Using Deduplication for a very simplified, high level overview on how deduplication works.
Basically, files are collected on file systems, compressed and then a signature generated for comparison with data already seen in the Deduplication DataBase (DDB) to determine if it is duplicate or unique data.
The same principles still apply for VM backups, but since we’re not collecting files for protection, VSA operates at a different level, deduplication still works on chunks of data, compressed first, then signature generation against a DDB.
I hope that helps explains the deduplication process a little.
Your second set of 10,000 files should have good dedupe ratios, but since the files laid down on the disks may not be physically identical on the respective virtual disks, it’s likely it won’t be 100% deduped, but should still achieve a pretty high dedupe ratio.
Of course if some of that data also exists on other clients, then you’ll benefit from dedupe savings potentially many times over.
Thanks,
Stuart