Many CDC applications use a secret “seed” to randomize chunk sizes. However, this is not secure and paper shows an attack on such algorithms.
Padding output to hide its length prevented the attack but it’s not provably secure.
Perfect length-hiding padding/encryption to the max chunk size does mitigate the attack, at the cost of increasing storage requirements.
A way to fix CDC is by applying AES to the rolling hash, and then comparing result to decide to chunk or now. This does add overhead of 1 AES operation per input byte. Modern hardware has dedicated AES instruction but the overhead is still 50–160%.
(Q: does AES need to be applied to the rolling hash, or can it be applied to the data bytes directly?)