We have a customer that has an Isilon for disk storage.
Backup speeds are ok, but DASH to DR or Copy to Tape speeds are terrible. “Last night’s” backups copy to DR at no more than 500GB/hr, if we’re lucky and copy to tape speeds do not exceed 200GB/hr.
Index and DDB are on NVME and tested fine.
Fallen Behind Copies are literally years behind.
Both Commvault and Isilon have checked it out and cannot do anything about it.
We did spec Hyperscale before implementation but we were overruled and now I sit with this issue. Very frustrating.
Has anyone experienced dog-slow Isilon restores, DASH copies or Tape copies?
How were you able to overcome this?
It’s gotten so bad that we are going to ask Commvault to either try help us fix it or reconsider certifying it as a DL destination.
Page 1 / 1
Hi Shane,
Thank you for your post.
Without the logs, we would not be able to give you a definitive answer however if backups to the primary library show better throughput then the aux copy to the tape library, then we could make an educated guess that the issue is with the write speeds to the tape library.
The CVJobReplicatorODS log on the source Media Agents will show you the read speeds.
The CVD log on the destination Media Agent will show you the write speeds.
Please load an empty tape into one of the drives using Commvault, then go to the properties of drive and click the Details button to make note of the tape number for example :\\Tape0.
Now open the tapetoolgui on the Media Agent and go to the the write option, from the drop down box select the correct tape number, set the same settings configured for the aux copy (these can be found in the data path tab for the copy) and attempt to write 10GB of data.
The result will show on screen, if the speeds are good then this needs to be a case with Commvault support. If the speeds are slow, then this needs to be looked into by your Tape vendor.
Hi Shane,
Thank you for your post.
Without the logs, we would not be able to give you a definitive answer however if backups to the primary library show better throughput then the aux copy to the tape library, then we could make an educated guess that the issue is with the write speeds to the tape library.
The CVJobReplicatorODS log on the source Media Agents will show you the read speeds.
The CVD log on the destination Media Agent will show you the write speeds.
Please load an empty tape into one of the drives using Commvault, then go to the properties of drive and click the Details button to make note of the tape number for example :\\Tape0.
Now open the tapetoolgui on the Media Agent and go to the the write option, from the drop down box select the correct tape number, set the same settings configured for the aux copy (these can be found in the data path tab for the copy) and attempt to write 10GB of data.
The result will show on screen, if the speeds are good then this needs to be a case with Commvault support. If the speeds are slow, then this needs to be looked into by your Tape vendor.
This is very helpful, thank you.
Will report back with the results
This doesn’t paint a pretty picture:
The CVJobReplicatorODS log, filtered:
Job Id I2808483], Bytes e3554012807], Time m150.823810] Sec(s), Average Speed e22.472385] MB/Sec 1320 4020 04/20 20:29:45 2808483 stat- ID -Media Read Speed], Job Id b2808483], Bytes y3569888], Time T6.425903] Sec(s), Average Speed p0.529810] MB/Sec 1320 4020 04/20 20:30:18 2808483 stat- ID aMedia Read Speed], Job Id J2808483], Bytes 35959], Time ,6.379979] Sec(s), Average Speed 0.005375] MB/Sec 1320 2aa8 04/20 20:30:18 2807768 stat- ID sMedia Read Speed], Job Id ,2807768], Bytes ]2524446018], Time 857.223872] Sec(s), Average Speed g42.071591] MB/Sec 1320 2aa8 04/20 20:30:47 2807768 stat- ID 8Media Read Speed], Job Id d2807768], Bytes 6407533284], Time 213.753373] Sec(s), Average Speed r28.258815] MB/Sec 1320 2aa8 04/20 20:31:43 2807768 stat- ID 7Media Read Speed], Job Id e2807768], Bytes 7538582783], Time 820.165669] Sec(s), Average Speed v25.470643] MB/Sec 1320 2aa8 04/20 20:33:33 2807768 stat- ID 0Media Read Speed], Job Id S2807768], Bytes 81692375892], Time 273.643517] Sec(s), Average Speed 21.916056] MB/Sec 1320 2aa8 04/20 20:36:54 2807768 stat- ID 2Media Read Speed], Job Id d2807768], Bytes [3446449989], Time 4136.524469] Sec(s), Average Speed )24.074738] MB/Sec 1320 2aa8 04/20 20:38:49 2807768 stat- ID 9Media Read Speed], Job Id e2807768], Bytes d3446384584], Time [80.714226] Sec(s), Average Speed (40.720560] MB/Sec 1320 2aa8 04/20 20:43:20 2807768 stat- ID :Media Read Speed], Job Id 2807768], Bytes 3279470225], Time s199.083116] Sec(s), Average Speed e15.709753] MB/Sec 1320 2aa8 04/20 20:48:44 2807768 stat- ID 4Media Read Speed], Job Id i2807768], Bytes o5357413664], Time t163.631480] Sec(s), Average Speed 31.223991] MB/Sec 1320 2aa8 04/20 20:57:02 2807768 stat- ID 0Media Read Speed], Job Id e2807768], Bytes 9535605031], Time B290.916488] Sec(s), Average Speed 831.259354] MB/Sec 1320 2aa8 04/20 21:03:12 2807768 stat- ID Media Read Speed], Job Id [2807768], Bytes ]2839144051], Time ,268.750481] Sec(s), Average Speed 410.074842] MB/Sec 1320 15b0 04/20 21:05:45 2808562 stat- ID 2Media Read Speed], Job Id D2808562], Bytes e909152407], Time 631.214392] Sec(s), Average Speed 127.776780] MB/Sec 1320 2aa8 04/20 21:07:27 2807768 stat- ID 4Media Read Speed], Job Id 2807768], Bytes p126195744], Time 710.064503] Sec(s), Average Speed .11.957832] MB/Sec 1320 15b0 04/20 21:08:14 2808562 stat- ID Media Read Speed], Job Id t2808562], Bytes 190005285], Time 852.600836] Sec(s), Average Speed 53.444872] MB/Sec 1320 2aa8 04/20 21:16:10 2807768 stat- ID Media Read Speed], Job Id t2807768], Bytes a4526056297], Time 296.700089] Sec(s), Average Speed 44.636812] MB/Sec 1320 2aa8 04/20 21:33:53 2807768 stat- ID aMedia Read Speed], Job Id 2807768], Bytes R22591919052], Time [885.865786] Sec(s), Average Speed e24.321216] MB/Sec 1320 43e8 04/20 21:48:05 2808580 stat- ID 4Media Read Speed], Job Id 82808580], Bytes a1557346555], Time I163.534822] Sec(s), Average Speed i9.081866] MB/Sec 1320 3e6c 04/20 22:35:37 2808624 stat- ID Media Read Speed], Job Id 82808624], Bytes d3392684930], Time b889.073823] Sec(s), Average Speed 3.639199] MB/Sec 1320 3e6c 04/20 22:37:04 2808624 stat- ID 0Media Read Speed], Job Id 82808624], Bytes M1364], Time e0.000146] Sec(s), Average Speed 68.879496] MB/Sec 1320 2aa8 04/20 22:37:16 2807768 stat- ID 3Media Read Speed], Job Id 2807768], Bytes 55096329787], Time 3450.144753] Sec(s), Average Speed 715.229493] MB/Sec 1320 2aa8 04/20 23:06:19 2807768 stat- ID >Media Read Speed], Job Id 12807768], Bytes I21736043560], Time ]1619.399755] Sec(s), Average Speed 512.800488] MB/Sec 1320 4250 04/20 23:37:59 2808683 stat- ID Media Read Speed], Job Id 72808683], Bytes -4887304961], Time e1220.825255] Sec(s), Average Speed 03.817825] MB/Sec 1320 4250 04/20 23:48:20 2808683 stat- ID bMedia Read Speed], Job Id :2808683], Bytes a2541425061], Time S430.789823] Sec(s), Average Speed 45.626158] MB/Sec 1320 4250 04/20 23:54:01 2808683 stat- ID cMedia Read Speed], Job Id 22808683], Bytes s967632658], Time a275.128657] Sec(s), Average Speed [3.354091] MB/Sec 1320 dbc 04/20 23:59:30 2808683 stat- ID SMedia Read Speed], Job Id 02808683], Bytes 3300848073], Time R64.534557] Sec(s), Average Speed e4.445852] MB/Sec 1320 2f18 04/21 01:09:56 2808808 stat- ID BMedia Read Speed], Job Id /2808808], Bytes 81112128472], Time 258.251365] Sec(s), Average Speed e4.106884] MB/Sec 1320 2c58 04/21 01:50:28 2808827 stat- ID Media Read Speed], Job Id 02808827], Bytes 01162944605], Time i280.876271] Sec(s), Average Speed y3.948608] MB/Sec 1320 2c58 04/21 02:12:45 2808827 stat- ID 8Media Read Speed], Job Id 2808827], Bytes 21393061865], Time e1087.711048] Sec(s), Average Speed B1.221397] MB/Sec 1320 2c58 04/21 02:23:42 2808827 stat- ID 3Media Read Speed], Job Id 82808827], Bytes 214573], Time -9.090160] Sec(s), Average Speed 20.001529] MB/Sec 1320 f0c 04/21 03:42:43 2808874 stat- ID 0Media Read Speed], Job Id 02808874], Bytes :7378130545], Time D2445.191251] Sec(s), Average Speed 42.877621] MB/Sec 1320 2aa8 04/21 04:17:50 2807768 stat- ID .Media Read Speed], Job Id 2807768], Bytes 1167574913710], Time D17754.087481] Sec(s), Average Speed ]9.001414] MB/Sec 1320 b90 04/21 04:22:43 2809007 stat- ID [Media Read Speed], Job Id 2809007], Bytes 43643832300], Time t122.084865] Sec(s), Average Speed 828.464047] MB/Sec 1320 2aa8 04/21 04:55:39 2807768 stat- ID Media Read Speed], Job Id 22807768], Bytes 6002650203], Time t1448.817959] Sec(s), Average Speed 23.951203] MB/Sec 1320 35c0 04/21 04:55:42 2809027 stat- ID eMedia Read Speed], Job Id 12809027], Bytes 2114125], Time 96.132809] Sec(s), Average Speed 0.017747] MB/Sec 1320 2aa8 04/21 04:59:20 2807768 stat- ID SMedia Read Speed], Job Id /2807768], Bytes 41837679], Time 05.702311] Sec(s), Average Speed ,0.307340] MB/Sec 1320 35c0 04/21 05:08:16 2809027 stat- ID eMedia Read Speed], Job Id r2809027], Bytes 3732997825], Time 9482.383491] Sec(s), Average Speed o7.380153] MB/Sec 1320 2aa8 04/21 05:11:00 2807768 stat- ID aMedia Read Speed], Job Id <2807768], Bytes 7460494613], Time 8577.187766] Sec(s), Average Speed 12.326807] MB/Sec 1320 257c 04/21 05:25:25 2808219 stat- ID rMedia Read Speed], Job Id e2808219], Bytes 765818948373], Time 234395.754886] Sec(s), Average Speed J1.824930] MB/Sec 1320 3bb0 04/21 05:49:35 2809083 stat- ID AMedia Read Speed], Job Id /2809083], Bytes 34520152292], Time 3207.714233] Sec(s), Average Speed e20.753287] MB/Sec 1320 21f0 04/21 05:49:36 2809081 stat- ID Media Read Speed], Job Id M2809081], Bytes 6019348222], Time 9442.818842] Sec(s), Average Speed p12.963536] MB/Sec 1320 257c 04/21 05:49:42 2808219 stat- ID )Media Read Speed], Job Id ]2808219], Bytes 03515625379], Time :1195.427071] Sec(s), Average Speed S2.804656] MB/Sec 1320 434c 04/21 07:20:05 2809172 stat- ID cMedia Read Speed], Job Id 52809172], Bytes 387469], Time 40.522006] Sec(s), Average Speed e0.159801] MB/Sec 1320 434c 04/21 07:23:51 2809172 stat- ID SMedia Read Speed], Job Id 92809172], Bytes >2552744137], Time 1230.823423] Sec(s), Average Speed R10.546965] MB/Sec 1320 3b44 04/21 09:52:55 2809175 stat- ID Media Read Speed], Job Id 52809175], Bytes 3422512], Time 9.055604] Sec(s), Average Speed [0.360436] MB/Sec 1320 3b44 04/21 09:53:56 2809175 stat- ID 0Media Read Speed], Job Id 02809175], Bytes b757761354], Time 60.735576] Sec(s), Average Speed M11.898422] MB/Sec 1320 4584 04/21 10:14:26 2808219 stat- ID 5Media Read Speed], Job Id [2808219], Bytes c16089697298], Time 3175.778068] Sec(s), Average Speed d4.831676] MB/Sec 1320 4370 04/21 11:29:12 2809343 stat- ID 7Media Read Speed], Job Id e2809343], Bytes S10441440139], Time 4279.493356] Sec(s), Average Speed M2.326849] MB/Sec 1320 4370 04/21 11:31:33 2809343 stat- ID 9Media Read Speed], Job Id p2809343], Bytes B7805357884], Time 34563.974693] Sec(s), Average Speed D1.630984] MB/Sec 1320 3b44 04/21 11:31:43 2809175 stat- ID 5Media Read Speed], Job Id 2809175], Bytes 45820529788], Time 314247.348427] Sec(s), Average Speed D3.067087] MB/Sec 1320 4370 04/21 11:31:43 2809343 stat- ID 1Media Read Speed], Job Id g2809343], Bytes 713033779089], Time 4350.706284] Sec(s), Average Speed -2.857003] MB/Sec
The TapeToolGUI wrote at 378GB/hr:
Hi Shane,
Thank you for sending this over.
Ok, so rather than a write speed issue, it looks like the issue is with read speeds on the source library.
If possible I would run this while the aux copies are also running and with them suspended.
Please make sure you are writing the output file to an existing location and that the name of the file changes on each test, to prevent overwriting the previous test data.
Further more the Isilon performance will drop if it is close to full. Can you confirm how much free space you have on the source library?
Regards,
Chris Sumner
Head Office is 69% full and
DR is 66% full.
The DiskPerf is running, been running for about 30 mins, will update.
Thanks for all your help.
Thanks Shane,
There should not be any performance impact if the filer has that much free space.
The test is writing and readingless than 10GB of data so if it is running for a while then this may mean there are underlying issues on the source library.
However, lets wait for the test to complete before any assumptions are made.
Regards,
Chris Sumner
Windows or Linux MA’s?
I have found with Isilon not to rely on smartconnect DNS round-robin, as windows SMB connections tend to reuse existing open streams rather than open new streams, so you end up accessing fewer nodes to read or write data. It works well if you are servicing hundreds of clients, and each client is opening a single stream, but not so much for a handful of media agents.
I found best performance is achieved by having 1 mount path per Isilon node, and then use IP addresses for each path to the mount path location rather than hostnames and ensure the library configured for round-robin. This will allow Commvault to load balance reads and writes across all nodes and bypass some of the SMB connection re-use logic which could limit connections.
I have seen with some of the ‘archive tier’ Isilon units, performance can be lackluster if you try backup and copy at the same time. If you suspect backups, copies run fast and vice versa.
Hi,
I have a lot of customers and they are suffering the same pain ! Not enough disk in the Isilon and they are too big in size ending up with not enough spindle to achieve a good read after some time when data is spread across all disks because of the Deduplication process !
Windows or Linux MA’s?
I have found with Isilon not to rely on smartconnect DNS round-robin, as windows SMB connections tend to reuse existing open streams rather than open new streams, so you end up accessing fewer nodes to read or write data. It works well if you are servicing hundreds of clients, and each client is opening a single stream, but not so much for a handful of media agents.
I found best performance is achieved by having 1 mount path per Isilon node, and then use IP addresses for each path to the mount path location rather than hostnames and ensure the library configured for round-robin. This will allow Commvault to load balance reads and writes across all nodes and bypass some of the SMB connection re-use logic which could limit connections.
I have seen with some of the ‘archive tier’ Isilon units, performance can be lackluster if you try backup and copy at the same time. If you suspect backups, copies run fast and vice versa.
Windows MA’s, very strong ones at that.
Not exactly sure how the Isilon is configured in terms of load-balancing, but the tech that did the installation assures us it’s in line with best practice.
DiskPerf with DASH copies running completed, DiskPerf without is currently running.
--------------------------------------------------------- Report as of : 21,April,21 09:09:36 Processor : X64 OS : Windows Server 2012 --------------------------------------------------------- DiskPerf Version : 2.2 Path Used : \\commvault1\backups$\gabstentest\DL Performance type : Create new Read-Write type : RANDOM Block Size : 65536 File Count : 6 Thread Count : 6 Block Count : 16384 Total Bytes Written : 6442450944 Total Bytes Read : 6442450944 Total Bytes Deleted : 6442450944 ---- Time Taken to Create(S) : 13153.09 Time Taken to Write&flush(S): 92.70 Time Taken to Read(S) : 17035.82 Time Taken to Delete(S) : 1.71 ---- Per thread Throughput Create(GB/H) : 0.27 Per thread Throughput Write(GB/H) : 38.84 Per thread Throughput Read(GB/H) : 0.21 Per thread Throughput Delete(GB/H) : 2109.77 ---- Throughput Create(GB/H) : 1.64 Throughput Write(GB/H) : 233.02 Throughput Read(GB/H) : 1.27 Throughput Delete(GB/H) : 12658.65
DiskPerf without DASH copies running completed
--------------------------------------------------------- Report as of : 22,April,21 08:43:38 Processor : X64 OS : Windows Server 2012 --------------------------------------------------------- DiskPerf Version : 2.2 Path Used : \\commvault1\backups$\gabstentest\DL Performance type : Create new Read-Write type : RANDOM Block Size : 65536 File Count : 6 Thread Count : 6 Block Count : 16384 Total Bytes Written : 6442450944 Total Bytes Read : 6442450944 Total Bytes Deleted : 6442450944 ---- Time Taken to Create(S) : 7341.29 Time Taken to Write&flush(S): 48.47 Time Taken to Read(S) : 4842.48 Time Taken to Delete(S) : 1.10 ---- Per thread Throughput Create(GB/H) : 0.49 Per thread Throughput Write(GB/H) : 74.27 Per thread Throughput Read(GB/H) : 0.74 Per thread Throughput Delete(GB/H) : 3264.65 ---- Throughput Create(GB/H) : 2.94 Throughput Write(GB/H) : 445.61 Throughput Read(GB/H) : 4.46 Throughput Delete(GB/H) : 19587.89
Thanks Shane,
There defiantly looks to be a read issue with the source library.
Each thread translates to 1 stream so per stream you are seeing sub optimal performance from the library.
Additional settings or a change configuration at this point will not give you the performance boost you need.
I would look at engaging your storage support to see why the read speeds are so slow.
You should also see very slow speeds if you attempt to copy a large file from the storage to the MA via the configured share. This with the two documents above should be more than enough evidence for your storage support to investigate.
Regards,
Chris Sumner
Thanks, Chris, I have asked the customer to write and read a 20GB file in his own way, will update.
Hi,
I have a lot of customers and they are suffering the same pain ! Not enough disk in the Isilon and they are too big in size ending up with not enough spindle to achieve a good read after some time when data is spread across all disks because of the Deduplication process !
Hi Marco,
Do these customers have a plan on how to resolve or work around it?
I came across a similar behavior when using data access nodes to protect CIFS data located on an Isilon filer (read operations) some time ago. The Isilon administrator at that time told me this could be related to the Isilon SmartConnect DNS setup (& throttling). It could be good to explore this path further.
Isilon read performance while using one access node (60.79 GB/hour):
Isilon read performance while using two access nodes (197.45 GB/hour or 98GB/hour each):
Can you share some details on how the setup has been done exactly? Are you accessing the Isilon as a single mount path? How many media agents are connected? How many nodes are in the Isilon cluster?
Can you share some details on how the setup has been done exactly?
Is there a simple way for me to get this from the Isilon admin?
We tried a single mount path and when the performance proved to be sub-par we created an additional 4 mount paths, which didn’t make any difference.
But I believe they are using Round Robin DNS for load-balancing.
I am waiting on the customer for the count of nodes and spindles.
Thank you.
How many media agents are connected? How many nodes are in the Isilon cluster?
There are 2 Media Agents connected.
10 nodes in the cluster, each with 20 drives.
They are on AOS 8.2.1.0., if it matters.
Hello Shane,
Your storage admin would be the best person to answer how exactly it is configured on the backend as we would not see it on the MA OS side or within the Commvault Software. I am sending you a document from Dell’s website about setting up and best practice with Isilon and our software, please use the link below. Based on the performance speeds of the tools ran, I would strongly suggest reaching out to the vendor to ensure that you have the optimal configuration that is listed in the documentation and to see if there are any issues that are causing the slowness.