Hello I have an issue related to DDB as shown below the q&i time is very high, known that the media agent is serving Oracle and SAP dbs only with daily full backup around 23 oracle rac and 18 sap client.library is from flash storage.DDB Disks is ssd and moved to pure storage [ NVMe disks] due to insufficient space on local disks any idea how to maintain this ?

DDB QI time threshold.

Userlevel 4

+10

Hi @Muhammad Abdullah

Q and I time is a metric for the performance of the average lookup in the DDB. This is directly tied to IOPS capability of the disk hosting the DDB. Per the MA hardware requirements, the DDB should be hosted on a local dedicated SSD. If you are using network attached storage for the DDB there will be latency as opposed to locally attached disk.

Please see the documentation here for HW requirements for MA’s hosting the DDB - https://documentation.commvault.com/11.24/expert/111985_hardware_specifications_for_deduplication_mode_01.html#cpuram

M

Userlevel 2

+9

Muhammad Abdullah
Author
Byte
59 replies
2 years ago
20 June 2022

Hello @Matt Medvedeff

DDB were hosted on local SSD disks before and the issue was exist, we had to move the DDB partition to SSD SAN Storage due to insufficient space so the issue it not related to the disks.
the existence SAN storage is NVMe all the way, so it perform pretty good, and we also use it for active active senarios as well.

Userlevel 7

+23

Damian Andre
Vaulter
1102 replies
2 years ago
21 June 2022

Hello @Matt Medvedeff

DDB were hosted on local SSD disks before and the issue was exist, we had to move the DDB partition to SSD SAN Storage due to insufficient space so the issue it not related to the disks.
the existence SAN storage is NVMe all the way, so it perform pretty good, and we also use it for active active senarios as well.

Its not just about IOPS, but IOPS and latency. If you can’t sustain enough IOPS the latency will increase - but given this is NVMe the issue is likely the latency in the connection between the MA and storage. Once you move the storage outside of the box, you inherit some level of latency, but 6ms is a lot. How is the pure storage SAN connected? FC or iSCSI? if iSCSI is it sharing the same interface with other network traffic given that this seems like a workaround?

S

+2

Sajan
Byte
11 replies
2 years ago
21 June 2022

Hey Muhammad,

Can you please generate the DDB stats for 1 day. Here is a screenshot for your reference.

M

Userlevel 2

+9

Muhammad Abdullah
Author
Byte
59 replies
2 years ago
21 June 2022

Hello

Hello @Matt Medvedeff

DDB were hosted on local SSD disks before and the issue was exist, we had to move the DDB partition to SSD SAN Storage due to insufficient space so the issue it not related to the disks.
the existence SAN storage is NVMe all the way, so it perform pretty good, and we also use it for active active senarios as well.

Its not just about IOPS, but IOPS and latency. If you can’t sustain enough IOPS the latency will increase - but given this is NVMe the issue is likely the latency in the connection between the MA and storage. Once you move the storage outside of the box, you inherit some level of latency, but 6ms is a lot. How is the pure storage SAN connected? FC or iSCSI? if iSCSI is it sharing the same interface with other network traffic given that this seems like a workaround?

Hello @Damian Andre

regarding the latency, i’ve checked and its less than 6ms it hit 3ms max as shown below.
and the connection between the MA and the array is FC Connection

M

Userlevel 2

+9

Muhammad Abdullah
Author
Byte
59 replies
2 years ago
21 June 2022

Hey Muhammad,

Can you please generate the DDB stats for 1 day. Here is a screenshot for your reference.

Dear @Sajan

kindly find what you asked for.

S

+2

Sajan
Byte
11 replies
2 years ago
21 June 2022

Hey Muhammad,

Can you please include Q&I times stats in the chart

M

Userlevel 2

+9

Muhammad Abdullah
Author
Byte
59 replies
2 years ago
21 June 2022

Hey Muhammad,

Can you please include Q&I times stats in the chart

Hello @Sajan

the preivious screentshot was for the whole ddb disk (2 Partitions) and the Q&I Time checkbox is not present.

thus, please find the below screen for one of the partitions with the Q&I Time checkbox selected.

S

+2

Sajan
Byte
11 replies
2 years ago
21 June 2022

This is a good chart. The Q&I times are up and down which is a good sign (better than seeing Q&I times stuck at high level).

You might want to review how the backup jobs are scheduled. Review what Commvault jobs run during the peak times.

M

Userlevel 2

+9

Muhammad Abdullah
Author
Byte
59 replies
2 years ago
21 June 2022

Hello @Sajan,

this MA is dedicated to backup oracle and SAP clients,

our oracle dbs runing daily full starting 2 AM, so since the issue showed up, some oracle jobs takes more than 20hrs duration, so i belive that this spikes are from the oracle clients which run slow since the issue happened.

such as this one which started 12 hrs ago and still 37% !

S

+2

Sajan
Byte
11 replies
2 years ago
21 June 2022

Thanks for sharing this information. This must be such a pain to manage.

How many such jobs run ? How is the performance when you run only one job ? Are there any synthetic full backups running at that time ?

M

Userlevel 2

+9

Muhammad Abdullah
Author
Byte
59 replies
2 years ago
21 June 2022

Dear @Sajan,

no synthetic jobs run on this MA

Only Full backups for Oracle and SAP.
28 individual Full backup job running at the same time 1 AM

S

+2

Sajan
Byte
11 replies
2 years ago
21 June 2022

oh! 28 full backups running at the same time is not a good idea. How many MAs do you have ?

Why dont you break it into 4 separate backups schedules ?

Alternatively, can you suspend all the full backups except one and test the performance of that single job ?

M

Userlevel 2

+9

Muhammad Abdullah
Author
Byte
59 replies
2 years ago
21 June 2022

Hello @Sajan

i have 3 MA

MA1 > sql, mysql, mongo, sybase, and dump files
MA2 > VMs, Filesystem
MA3> Oracle, SAP

unfortunately, i cant suspend any of this jobs as this is a critical dbs
but i will reschedule one of the large dbs to run alone in a different time and check the performance

Userlevel 7

+23

@Muhammad Abdullah , following up on this one.

Did the reschedule increase the throughput?

Thanks!

Userlevel 2

+5

J Dodson
Vaulter
21 replies
2 years ago
6 July 2022
Answer

One additional point to this discussion, how many DDB partitions for any and all DDBS reside on this particular mount path/disk? With DDB v5, Commvault will spawn new DDBs if parameters are met for high QI times or a large number of unique blocks so that could spread more work across the disks and if this disk is a DDB target for other DDBs then it could be a contention issue on the disk at the disk level.

You can check this by looking at Resource Monitor, if this is a Windows MA, and look at the disk queue depth for the disk during these spike times of QI times. It may be necessary to distribute those DDBs to other disks/mount paths.

Also, I have had extensive first hand experience that moving from local NVMe disk to Pure luns, both ISCSI and SAN attached will NOT elicit in any way similar and definitely not better performance. My own experience is about a 30-40% minimum drop in performance and latency, Pure itself is solid performing storage solution, but the DDB lookups are just too intensive for using a shared controller and no one could afford to dedicate a Pure array specifically for the DDB operations.

Another contributing factor could be Commvault operations that would compete for resources such as Data Aging(check for activity on SIDBPrune and SIDBPhysicalDeletes logs as to when they are running at the time of backup for these servers). Data verification operation, or Space Reclamation operations, both of which would have job histories. These operations would negatively impact backup operations if they overlapped.

Userlevel 2

+5

J Dodson
Vaulter
21 replies
2 years ago
6 July 2022

Add DDB backups to the list of processes that could negatively affect backup/lookup performance as well.

Userlevel 7

+19

Onno van den Berg
Commvault Certified Expert
1093 replies
1 year ago
4 November 2022

@J Dodson you observation regarding seeing an increase of IO latency for DDBs when moving the DDB from a local NVMe drive to a NVMe based storage solution like Pure Storage when using FC/iSCSI is as expected of course. The path towards is long and is traversing via a protocol which is not so efficient as a local drive. You might in this case consider using NVMe-oF to reduce the latency to values closer than what you see when using a local NVMe drive.

But as @Damian Andre already pointed out seeing 6ms of latency is not normal, even for a FC connection towards a Pure Storage array. Some areas to look into:

FC misconfiguration
QoS on the Pure Storage side e.g. volume IOps limit

Also why are you not considering the use of partitioned DDBs so you spread the load across all MAs and also introduce some form of HA even though the use of block storage is not so optimal assuming you are using the Pure Storage array also to store the backup data.

DDB QI time threshold.

18 replies

Reply

Reply

Sign up

Login to the community

Scanning file for viruses.

This file cannot be downloaded