Solved

Storage Pool - best practices or no logic?


Badge +4

Hello

Following the Commvault SE recommendations we have created a storage pool of 4 Media Agents with DAS storage. Initially all MAs could read and write data to each mount path and I have noticed that the "lan-free" logic does not work. I mean, each MA tries to access each mount path even if "closest" or fastest way available. Despite our network is 10Gb, data transfer between MAs is slow,  very slow. Now, I have allowed only reads for any MA for any path and it works better, but still not perfect. The most important fact that aux copy to tape is very slow. Each policy allows each media agent access to any tape  so my idea was "ok, let's stop access via IP and only MA that has its DAS will read/write data". So, backups are fast, aux copy is.... failing! Because MAs couldn't access data on another MAs.

So, I am stuck with no ideas, except that stop using Storage Pool and go back to use each media agent as standalone.

Any ideas how to:

- have a storage pool

- force each MA to use their DAS only

- force CV to aux copy using all MAs, so if block1 is on ma1 - it will be written by ma1 to tape, if block2 is on ma2 - it will be written by ma2 to tape.

 

Suggestions? 

icon

Best answer by Roman Melekh 7 July 2022, 15:41

View original

15 replies

Badge +4

This is the Primary_Global Storage Policy

One of mount path configs example
 

One of the “results” I am figthing with now:


 

  • Storage Policy Name: NCTR-related VMs
  • Copy Name: 2 - Secondary (tape)
  • Start Time: Thu Jun 2 20:51:09 2022
  • Scheduled Time: Thu Jun 2 20:51:09 2022
  • End Time: Thu Jun 2 21:39:09 2022
  • Error Code: [13:138] [40:91] [13:138] [13:138] [13:138]
  • Failure Reason: Error occurred while processing chunk [54068521] in media [V_14481847], at the time of error in library [DAS - d1w-commvma01] and mount path [[d1w-commvma01] C:\CV_Disk\CV_Disk_2], for storage policy [NCTR-related VMs] copy [2 - Secondary (tape)] MediaAgent [d1w-commvma01]: Backup Job [3871594]. Please check if all the disk library mount paths in the dedup storage policy copy are properly configured to read from the MediaAgent. Failed to Copy or verify Chunk [54068521] in media [CV_MAGNETIC], Storage Policy [NCTR-related VMs], Copy [1 - Primary (disk)], Host [d1w-commvma01.ad.umanitoba.ca], Path [C:\CV_Disk\CV_Disk_2\DumpsterDD5\CV_MAGNETIC\V_14481847], File Number [693], Backup Jobs [ 3871594]. Please check if all the disk library mount paths in the dedup storage policy copy are properly configured to read from the MediaAgent. Please check if all the disk library mount paths in the dedup storage policy copy are properly configured to read from the MediaAgent. Error occurred while processing chunk [54067506] in media [V_14481847], at the time of error in library [DAS - d1w-commvma01] and mount path [[d1w-commvma01] C:\CV_Disk\CV_Disk_2], for storage policy [NCTR-related VMs] copy [2 - Secondary (tape)] MediaAgent [d1w-commvma01]: Backup Job [3871594]. Please check if all the disk library mount paths in the dedup storage policy copy are properly configured to read from the MediaAgent. Error occurred while processing chunk [54072835] in media [V_14481847], at the time of error in library [DAS - d1w-commvma01] and mount path [[d1w-commvma01] C:\CV_Disk\CV_Disk_2], for storage policy [NCTR-related VMs] copy [2 - Secondary (tape)] MediaAgent [d1w-commvma01]: Backup Job [3871594]. Please check if all the disk library mount paths in the dedup storage policy copy are properly configured to read from the MediaAgent. Error occurred while processing chunk [54025381] in media [V_14478780], at the time of error in library [DAS - d1w-commvma01] and mount path [[d1w-commvma01] C:\CV_Disk\CV_Disk_3], for storage policy [NCTR-related VMs] copy [2 - Secondary (tape)] MediaAgent [d1w-commvma01]: Backup Job [3871594]. Please check if all the disk library mount paths in the dedup storage policy copy are properly configured to read from the MediaAgent.

Copied data size: 0 Bytes

Badge +4

Interesting fact that sometimes CV does use 2-3 MAs for aux copy jobs, but sometimes result is not the best as it could:
 

 

Badge +4

And, one more comment - is creating a Tape Pool could resolv ethe issue? I have two tape libraries of 8 drives each, and did not find a way to create a pool of two libraries (in case this could resolve the problem...)

Userlevel 7
Badge +19

There is a lot here to unpack, but I wanted to say that the Commvault Lan-free concept does not apply here and perhaps isn't what you think it is. Lan-free essentially means that a client performing a backup will prefer and default to use the media agent installed on itself vs using another MA in the storage policy. Fr example, it works in cases where you have a bunch of physical servers that all have Fiber FC to a tape library. Rather than copying over the network to a Media Agent that has access to the tape, install the MA software on each client machine so it can access the tape drive directly itself, i.e Lan-free.

It also works in cases where you have an agent like VSA that pull data from a remote source, and can be deployed on a Media Agent - it will use the local MA to transfer the data to storage rather than a remote MA over the network.

 

Any ideas how to:

- have a storage pool

- force each MA to use their DAS only

- force CV to aux copy using all MAs, so if block1 is on ma1 - it will be written by ma1 to tape, if block2 is on ma2 - it will be written by ma2 to tape.

 

I am assuming you are using deduplication? if so, the block could end up on any media agent the first time its written. The next time we see the block we don't write it again, but when we do an auxcopy or restore we have to read that block from the original location, witch could be on any of the DAS mount paths. The Media Agent performing the read has no idea where it is until we go to read it. With a single storage pool it doesnt matter which MA wrote the job, over time the data is going to be spread out across every mount path due since the first MA to see the block writes it.

So in a single storage pool you still have to configure each media agent to have read access via network share to the other Media Agent's DAS (or using dataserver-IP as you are here).

that auxcopy screenshot (350GB/hr) is about 100MB/sec which is very slow. I would say the design is OK but there is a bottleneck slowing things down for sure. 100MB/sec is eerily close to 1Gbit - how is the tape drive accessed? FC or iSCSI?

 

Badge +4

Fr example, it works in cases where you have a bunch of physical servers that all have Fiber FC to a tape library. Rather than copying over the network to a Media Agent that has access to the tape, install the MA software on each client machine so it can access the tape drive directly itself, i.e Lan-free.

……...

So in a single storage pool you still have to configure each media agent to have read access via network share to the other Media Agent's DAS (or using dataserver-IP as you are here).

that auxcopy screenshot (350GB/hr) is about 100MB/sec which is very slow. I would say the design is OK but there is a bottleneck slowing things down for sure. 100MB/sec is eerily close to 1Gbit - how is the tape drive accessed? FC or iSCSI?

 

Here is the thing - each MA has access to each tape drive, using FC:

 

SO, technically, it should work as I think it should work, but CV uses LAN to transfer data between MAs. An example of slow running task now:

It reads data from two MAs and uses only one to write data:

And the data is on MA#3 whicl also has access to tape drives...:

 

 

I am trying to resolve this problem, instead of getting rid of storage pool.

Userlevel 3
Badge +9

Hi @Roman Melekh,

Can you check under Control Panel → media Management → Resource Manager Tab:

Are the below values set to 0 or 1?

Set the value of these 3 options to 1 and see if this improves the behavior. These options will defer read operations to the paths local to each MA involved.

Badge +4

Thanks for an advise, I will check if it helps during the weekend’s SyntheticFull and AUX processes and will let you know

Badge +4

Hi @Roman Melekh,

Can you check under Control Panel → media Management → Resource Manager Tab:

Are the below values set to 0 or 1?

Set the value of these 3 options to 1 and see if this improves the behavior. These options will defer read operations to the paths local to each MA involved.

Hi

I do see some kind of improvement - media agents do not try to access libraries by IP but there is some strange behavior:

They are still sharing data between them (see source and destination MAs) and still transferring data via LAN:

 

 

We have 5 media agents and 2x8 tape drives, speeds are still leave to be desired:

 

Badge +4

@Matt Medvedeff More interesting behaviour:

If the data is on Sinkhole MA, why d1w-commvma01 is the destination one? 

Userlevel 3
Badge +9

On the Aux copy policy, are you combining source streams and what is the mutliplexing factor?

Badge +4

On the Aux copy policy, are you combining source streams and what is the mutliplexing factor?

@Matt Medvedeff Yes, we do and I have tried various values with no significant change.

Considering our envirounment, what could be best practice way?

  • We have 5 (will have 6) media agents with DAS, each has DDB, and each has access to tapes (FC).
  • We have two librarues, 8 drives each, each MA can acces each tape drive.
  • Each MA has 2x10GB LACP LAN access
  • Each MA has 2 to 8 mount points (raid6 disks groups)

What could be an appropriate number of streams and multiplexing factor?

 

Badge +4

An update (probably the similar to the Monday’s):

I do not see any IP | mountpath connections but MAs are still transferring data between them:

Both d1w-commvma01 and 03 has access to tapes. Why 03 transfers data to 01 to write data to tape?

Userlevel 6
Badge +14

MediaAgent 1 is most likely set up as the preferred data path.  Can you please verify this?

Badge +4

MediaAgent 1 is most likely set up as the preferred data path.  Can you please verify this?

Yes, it is. But “preffereed” should not be “the only one”.

Anyway, it seems like the real issue is not here. With CV SE help we have discovered that “Data Server IP” access has some issues and switching to sharing data via UNC did solve the aux coyp speed problem. CV transfers data very fast betweene MAs but due to network outages it needed to re-read info.

 

Some additioal LACP tweaks and our storage pool of 5 MAs with DAS works well!

Userlevel 7
Badge +23

Thanks for the update!!

Reply