Solved

Dedupe & gzip compression. Does the --rsyncable option help?

  • 1 September 2021
  • 13 replies
  • 1072 views

Userlevel 1
Badge +4

Anybody used gzip with --rsyncable to increase dedupe efficiency and does it actually help? 

 

People still like to do app/database dump and pickups its all good until they compress the dumps and you convert the pickup backup from tape to disk dedupe.  Since dedupe doesn’t like compressed files as a source there’s rumors that --rsyncable option will help here.

This option uses a totally different compression algorithm that's rsync friendly and only increases the compressed size by about 1% when compared to the regular flavoured gzip file. 

 

 

  

icon

Best answer by JM- 11 November 2021, 08:02

View original

13 replies

Userlevel 7
Badge +23

Hey @JM- !  thanks for the post!  I’m going to see if we have a definitive answer internally first.  If not, I’ll move this as a conversation to our Best Practices section to see if anyone can advise.

Userlevel 1
Badge +4

pigz is a gzip variant and more likely to be used on large files it also has the  --rsyncable option. 

 

The pigz version is designed to be more efficient than gzip by using parallel processing.  Might be another to check.   

Userlevel 7
Badge +23

pigz is a gzip variant and more likely to be used on large files it also has the  --rsyncable option. 

 

The pigz version is designed to be more efficient than gzip by using parallel processing.  Might be another to check.   

Theoretically it should help if you are trying to dump out databases to flat file and backup the location. It may depend on your specific workload, so you may need to do some experimentation on the various scenarios to see which works best.

If you are absolutely looking for the least amount of backup size, it might even be worth exporting uncompressed and letting the compression happen as part of the backup phase - that is if you have enough space to stage the data too and if it does not affect backup/restore time greatly.

I know our compression for VSA switched to LZO away from GZIP a few years ago - I am not sure which we use for databases though.

Userlevel 1
Badge +4

The (critical) app admins are keeping a couple of dumps as uncompressed then compressing them and retaining on disk for a couple of weeks. 

The idea is:

  • The two uncompressed dumps are for emergency restores.
  • The two week supply of compressed dumps on disk is used as a supply for development/testing  refreshes and possible prod restores if the two most recent uncompressed dumps are of no value 

I never discourage “dump and pickup” because it removes the backup crew from having to do emergency restores from the latest backup cause the app/DBAs admins now own that.   In this case they do their own refreshes for dev/test as well, without using the backup crew to do any restores. 

 

Regarding “least backup size” start with the simple stuff and progressively change as required.  Like using the rsynable option and backing up the dump location separately as an inc. 

 

Userlevel 7
Badge +23

Hey @JM- , have you had any luck experimenting with the options to see what their effect is?

Thanks!

Userlevel 1
Badge +4

The only thing I can report is the compression time is unaffected.  Still running in a tape only environment that will change in the near future when its replaced by dedupe capable kit.  When that done will be able to test fully.  Still doing the prep for that.  

Userlevel 7
Badge +23

Ok, awesome.  Keep us posted!

Userlevel 7
Badge +23

Hey @JM- , gentle follow up to see how things were going.

Thanks!

Userlevel 7
Badge +23

Hi @JM- , gentle follow up on this issue.  Curious if you had any lucky testing everything out.

Userlevel 1
Badge +4

Hopefully we will know within a month - still installing the kit

Userlevel 7
Badge +23

Hi @JM- , hope all is well!

Following up a month later.  Have you had a chance to test?  If not, let me know and I’ll follow up at a later date!

Userlevel 1
Badge +4

pigz/gzip  --rsyncable   That option actually works for dedupe and does a good job. Savings will depend upon the original data and how many cycles are stored. 

 

Can’t say the same about windows *.zip files.  Backed up about 6 TB in one directory for the first time, savings were <1% 

 

Userlevel 7
Badge +23

@JM- , thanks for the follow up!

To confirm, are you saying that windows zip only saved 1% or less?  How did the --rsyncable option work regarding savings percentage?

Reply