Skip to main content
Solved

Where Does Rehydration Occur When deduplicated Hard Disk storage is sent to tape

  • 8 August 2024
  • 5 replies
  • 46 views

Hello,

I have a question regarding secondary aux copy to tape.

We have a storage policy where primary storage on disk with deduplication is on Media Agent 1. We have an Aux Copy to Tape, the tape library is on Media Agent 2. Where is the primary storage data rehydrated and stored before getting sent to the secondary copy?

I guess I am looking how the data flows the from primary to the aux tape copy.

 

Thanks in advance,

Lance

Hello @hiLance 

 

When we are reading data that has been deduplicated we do not need to stage the data anywhere. We will just read the data and then write it down into the location. The same is true for a restore or to a tape copy.

 

I guess the short answer to your question is “in memory” haha. 

 

Kind regards

Albert Williams


Hello @hiLance 

 

When we are reading data that has been deduplicated we do not need to stage the data anywhere. We will just read the data and then write it down into the location. The same is true for a restore or to a tape copy.

 

I guess the short answer to your question is “in memory” haha. 

 

Kind regards

Albert Williams

But in memory on MediaAgent 1 or 2 - if we take the example from question.. Got me curious as well now. Hopefully we can get more info/details.


Same question as mfox, Would it be hydrated on MediaAgent1 or MediaAgent2. This would affect how much data is actually transferred from MA1 to MA2.


Whichever MA is reading it from Primary Storage will do the rehydrating.

Thanks,
Scott
 


 Hello @hiLance 

Commvault will take the path of fewest hops. So if the MA that is writing the data to the tape also has access to the source disk library, it will perform both reads and writes. 

It sounds in your case that would put reads over a WAN and would be very slow. So MA1 will read the data and pass it to MA2 which will write the data into the tape. 


No data will be staged at all in the entire process.

 

When we write data we write into a chunk folder. This folder will contain 2 things. It will contain raw data that was unique and referenced to primary records or a Meta Datafile that points to a different chunk folder that contains the real raw data. 

So the DDB contains a list of known primary records and when data is written is able to either write the raw data or add an entry into the MetaData file that points to the real raw data. 

 

The Commserv database knows what chunks are required for every job to be restored. So to perform a restore of data we will go to chunk A. We will either get all the data we need from that chunk or have a pointer to the next chunk where the raw data is. This is why we are able to perform restored without the DDB being online at all. DDB’s are only used for Write operations. 

 

I hope this explains how we write and read data to help understand why we do not need to stage data. We just follow the cookie trail untill we have collected it all. 

 

Kind regards

Albert Williams


Reply