Question

Data Written x Size on Media

  • 24 August 2023
  • 3 replies
  • 698 views

Userlevel 2
Badge +11

I've seen several discussions regarding this topic. But I still don't get it.

 

From the docs:

Application Size: 
The size of the data that needs to be backed up from a client computer. Application size is not always identical to the size reported by file or database management tools.


Data Written: 
New data written on media by each backup job.


Size on Media: 
Size of deduplicated data written on the media.

 

Consider the following real job:

Job 1 (deduped disk lib is the destination): 

App Size: 33.4TB

Data Written: 24.93

Size on Media: 7.30

 

1st question: what is the total size occupied by this job in the dedup disk lib?

2nd question: is data written size, before dedup? Does it account for sw compression only? both?

3rd question: is size on media just the size of unique blocks for the job (disregarding the baseline?)?

 

regards,

Pedro


3 replies

Userlevel 3
Badge +8

Hi @PedroRocha 

Application Size is size of protected data on the client

Data Written is volume of this data written to storage - after compression, deduplication, etc.

Size on Media is the size of the backup job (application data and index) or the total size occupied on Media. With deduplication, this includes the data written AND the size occupied by aged jobs that are still referenced by other valid job.

 

Note: Data written is the total data written by the current active jobs and Size on disk is the total size occupied on disk by the current active jobs plus its dependent baseline of aged jobs. Hence Size on disk always tends to be on the higher side when compared with Data Written.

 

Example1:

=========

You run a full backup for 100 GB. Later another job runs with application size 110 GB runs but only 10 GB of data is written, the rest is deduplicated. The first job of 100 GB ages off over the time.

 

Now, here data written in storage policy = 10 GB (the size of active job)

And Size on Media = 110 GB (active +baseline of 100 GB)

 

Example2:

=======

The following letters represent data on the client:

A B C D E F G

You run your first backup and this becomes your baseline. Let's say this is a Data Written of 7MB, 1MB for each letter. After that backup completes, some of the data changes:

A B C D E F G H I

When you run your next backup, deduplication ensures that only the changed H and I are written; we already have the other data. The data written for this job is only 2MB.

This goes on for a few weeks but only the changed data will be written after the initial baseline was created, and eventually the original job meets retention and ages off. The original data that never changed accounts for 5MB written

That data is still associated with your newer backups, it just wasn't written again when they ran because of dedupe. You can't remove it and still have a good backup, but because the job that originally wrote it has aged, it's not reflected in your data written totals.

Userlevel 2
Badge +11

Hi! Thanks for the complete answer.

Still, there's something wrong here… size on media for all the jobs that we have, are smaller then data written. It seems that size on media is listing only the unique data blocks. Does it make sense?

Userlevel 3
Badge +8

@PedroRocha,

 

We can review this for you, However you can log a support case by uploading the CommserveDB with additional details such as library/mount paths, storage policy, etc.

Reply