Question

DB2 backup on AIX very slow


Badge +2

Working with DB2 on AIX for our SAP AFS Application. Approx. 4TB database split over 4 logical volumes in an IBM Virtual IO environment on IBM Power. This backup takes 20 to 26 hours. Below the statistics in db2diag after the backup. I’m not sure how to read this, but I think the WaitQ is quite high.
Hope anyone can shed a light on this.

 

2021-03-13-01.02.01.804357+060 E411673A4491 LEVEL: Info
PID : 4718886 TID : 61753 PROC : db2sysc 0
INSTANCE: db2prd NODE : 000 DB : PRD
APPHDL : 0-12866 APPID: *LOCAL.db2prd.210311220047
AUTHID : DB2PRD HOSTNAME: lpdd001
EDUID : 61753 EDUNAME: db2agent (PRD) 0
FUNCTION: DB2 UDB, database utilities, sqluxLogDataStats, probe:395
MESSAGE : Performance statistics
DATA #‌1 : String, 3994 bytes

Parallelism = 40
Number of buffers = 40
Buffer size = 2494464 (609 4kB pages)

BM#‌ Total I/O  MsgQ WaitQ Buffers MBytes
--- -------- -------- -------- -------- -------- --------
000  93654.87 47851.71 43854.53 455.97 1032152 2450793
001 93638.40 5314.21 28579.66 58494.24 309031 733532
002 93627.64 912.52 23500.10 68044.89 58527 138565
003 93503.11 1678.18 28495.65 62110.43 221485 525641
004 93644.75 622.59 23105.48 68747.61 54465 128969
005 93165.99 346.84 12501.31 79156.65 28099 66315
006 93654.74 283.14 14433.68 77780.60 14188 33310
007 93244.06 36.63 2850.77 89202.64 2564 5703
008 93531.91 446.75 17849.08 74075.73 24954 58870
009 93545.34 101.93 3432.56 88856.77 2574 5715
010 93521.20 206.49 12001.90 80156.74 10726 25088
011 93240.49 87.94 4393.93 87603.56 6192 14320
012 93651.29 3.49 1221.64 91272.79 234 170
013 93124.41 113.58 5153.03 86702.00 9173 21398
014 92908.78 70.68 4522.99 87160.84 4633 10934
015 93645.90 0.69 1099.65 91392.15 173 24
016 93597.10 39.78 2567.56 89835.86 1630 3485
017 93239.47 46.48 3816.44 88221.91 4700 10776
018 93503.39 27.49 2128.51 90193.68 1068 2150
019 93653.45 1.24 1100.73 91398.14 176 31
020 92254.68 0.71 16.21 91395.10 7 14
021 92275.78 16.97 701.34 90693.58 401 947
022 92348.10 18.72 1493.37 89899.50 1333 3161
023 93546.50 14.29 1454.45 90924.33 481 756
024 92418.47 0.56 13.68 91397.77 5 8
025 93625.70 39.40 2874.08 89558.39 1570 3342
026 93492.63 0.48 941.87 91396.94 166 8
027 93614.09 3.70 1198.32 91258.53 281 280
028 93176.60 23.49 1422.08 90576.69 734 1357
029 93611.89 0.95 1057.11 91400.47 170 16
030 93613.95 2.82 1189.21 91268.55 238 178
031 93436.97 1.16 883.56 91398.95 173 24
032 93618.84 0.81 1068.24 91396.41 174 27
033 93236.32 0.52 686.91 91395.71 166 8
034 92442.61 1.19 20.41 91390.43 10 21
035 93626.35 0.46 1069.34 91403.11 166 9
036 93573.80 1.55 1020.01 91398.75 173 24
037 92463.75 0.85 9.93 91401.25 5 8
038 92487.93 1.04 15.74 91395.23 8 16
039 92564.58 12.59 1011.56 90387.66 717 1698
--- -------- -------- -------- -------- -------- --------
TOT 3730726.03 58334.80 254756.87 3372200.75 1793722 4247710
MC#‌ Total I/O MsgQ WaitQ Buffers MBytes
--- -------- -------- -------- -------- -------- --------
000 91412.17 62751.83 28626.82 0.00 178543 424734
001 91407.44 64251.43 27114.16 6.90 202906 482691
002 91407.65 64279.45 27074.99 6.90 205911 489840
003 91408.45 64356.74 27015.30 6.89 206245 490635
004 93677.88 65274.64 28331.31 22.06 205430 488691
005 91407.98 62846.17 28522.14 6.88 154640 367871
006 91410.40 62892.63 28471.70 6.88 148478 353212
007 91397.05 62804.50 28565.73 6.90 162703 387052
008 91411.33 62795.12 28563.84 6.90 173397 412492
009 91407.81 62879.88 28481.98 6.90 155480 369869
--- -------- -------- -------- -------- -------- --------
TOT 916348.20 635132.41 280768.01 77.27 1793733 4267094

Thanks in advance..

 


14 replies

Userlevel 2
Badge +2

Hi Marcel,

Can you get the performance statistics from db2sbt.log on the DB2 server? This log will be in the Commvault log directory.

 

Thanks,

Sunil

Badge +2

Hi Sunil,

See attached log file,

 

Thanks in advance,

 

Marcel Huisman

 

 

Badge +3

@Marcel Huisman It appears 30% of the backup time was waiting on a free buffer(MsgQ).
You may need to increase the backup streams and see how that helps.
You could double the streams from 9 to 18 as the parallelism and buffers are already set to a higher value(40)

Ref link - http://www.redbooks.ibm.com/abstracts/tips1344.html?Open

Badge +2

Thanks Mahender, just started a new backup with 18 streams..

I also see below in the db2diag.log:
 

2021-03-14-17.52.37.444843+060 I1331304A1841        LEVEL: Error (OS)
PID     : 15598158             TID : 1              PROC : db2exfmt
INSTANCE: db2prd               NODE : 000           DB   : PRD
APPID   : *LOCAL.db2prd.210314165405
HOSTNAME: lpdd001
EDUID   : 1
FUNCTION: DB2 Common, OSSe, ossErrorIOAnalysis, probe:100
CALLED  : OS, -, open
OSERR   : EACCES (13) "The file access permissions do not allow the specified action."
DATA #1 : String, 111 bytes
A total of 2 analysis will be performed :
 - User info
 - Path access permission

 Target file = /db2exfmt.msg
DATA #2 : String, 188 bytes
  Real user ID of current process       = 1003
  Effective user ID of current process  = 1003
  Real group ID of current process      = 2012
  Effective group ID of current process = 2012
DATA #3 : String, 41 bytes
current sbrk(0) value: 0x0000000113e01900
DATA #4 : String, 209 bytes
Information of each subdirectory leading up to the first inaccessible one is shown in the format below :
   <UID>:<GID>:<permissions> (subdirectories)


 Cannot descend any further than the above subdirectory.
CALLSTCK: (Static functions may not be resolved correctly, as they are resolved to the nearest symbol)
  [0] 0x090000001F1DBB9C pdOSSeLoggingCallback + 0x448
  [1] 0x09000000096772FC oss_log__FP9OSSLogFacUiN32UlN26iPPc + 0x1BC
  [2] 0x0900000009677770 ossLogSysRC + 0x70
  [3] 0x090000000969FC3C ossErrorIOAnalysis__FCPC21OSSErrorAnalysisParam + 0xD7C
  [4] 0x090000001ED256A4 sqloSystemErrorHandler + 0x514
  [5] 0x090000001F200864 sqloopenp + 0xE48
  [6] 0x090000001E89AAFC sqlagf_open_handle__FP12SQLO_FHANDLEPc + 0x84
  [7] 0x090000001E5B9854 sqlaeror__FP5sqlca + 0x108
  [8] 0x090000001E5B837C sqlabndx + 0x208
  [9] 0x090000001EAF54D4 sqlabndx + 0x2E0
  [10] 0x000000010000CD28 main + 0x1CE8
  [11] 0x00000001000002F8 __start + 0x70
 

Not sure which directory is inaccessible..

 

KR,

 

Marcel Huisman

Badge +2

Below message in db2diag.log, does this message mean my UTIL_HEAP_SZ is too low?


Reserved heap size exceeded for Backup/Restore/Util Heap (UTIL_HEAP_SZ) on node 0,
allocating additional unreserved memory.
Requested block size           : 4198399 bytes.
Physical heap size             : 205783040 bytes.
Configured heap size           : 204800000 bytes.
Unreserved memory used by heap : 983040 bytes.
Unreserved memory left in set  : 1972568064 bytes.
 

Badge

Hello Marcel,

Can you inform the status of the backup? And also could you provide information about the DB2 version we are working with here?

 

Thanks,

Raunak Jalan

Badge +2

Hi Raunak,

The backup is still running. will post the statistics when the backup has finished. We use DB2 v10.5.

KR,

Marcel Huisman

Badge

Marcel, once the backup is done, can you update the parameter UTIL_HEAP_SZ to AUTOMATIC?

Ref link:   https://www.ibm.com/support/knowledgecenter/SSEPGG_11.1.0/com.ibm.db2.luw.admin.config.doc/doc/r0000330.html

Badge +2

Last backup statistics. I updated the util_heap_sz to automatic, wil monitor the backup today..

 

2021-03-17-14.05.22.899305+060 E421942A5549         LEVEL: Info
PID     : 4718886              TID : 43695          PROC : db2sysc 0
INSTANCE: db2prd               NODE : 000           DB   : PRD
APPHDL  : 0-5590               APPID: *LOCAL.db2prd.210316101954
AUTHID  : DB2PRD               HOSTNAME: lpdd001
EDUID   : 43695                EDUNAME: db2agent (PRD) 0
FUNCTION: DB2 UDB, database utilities, sqluxLogDataStats, probe:395
MESSAGE : Performance statistics
DATA #1 : String, 5052 bytes

Parallelism       = 48
Number of buffers = 48
Buffer size       = 2068480 (505 4kB pages)

BM#    Total      I/O      MsgQ      WaitQ      Buffers   MBytes
---  --------  --------  --------  --------    --------  --------
000  96316.56  56139.15  38903.69    110.11     1246896   2454309
001  96312.92   1969.70  17285.08  76178.38      268557    528335
002  96305.68   1661.15  13537.22  80280.31       70578    138629
003  96312.73    284.80   7605.03  87609.69       17117     33311
004  96252.02   6151.72  16703.25  72481.56      373321    734569
005  96236.86    647.76  13641.52  81122.37       65934    129421
006  96314.81    196.36   6453.44  88852.73       12940     25088
007  96310.49    119.03   4316.01  91064.60        7484     14348
008  96306.78     48.82   3130.93  92318.12        5670     10776
009  96284.55    112.27   2197.86  93166.51        3106      5717
010  96303.68    667.74  10757.38  84062.47       33911     66344
011  96310.38     15.23    539.28  94948.18         581       756
012  96245.17     44.58   2725.72  92667.31        1967      3486
013  96249.93    471.50  10453.35  84508.70       30106     58874
014  96255.10     24.00   2561.92  92860.24        1802      3161
015  96252.45     13.41   1608.19  93823.33        1066      1711
016  96247.74     17.15   1008.69  94414.66         679       947
017  96252.49      2.14    389.37  95053.65         287       178
018  96225.68      0.54    176.29  95241.77         209        24
019  96206.24      0.59    136.06  95262.47         204        14
020  95823.56    149.12   6710.29  88415.38       10872     21401
021  95853.67      0.53     12.34  95264.89           6         8
022  96315.45      0.68    241.94  95264.86         211        27
023  95975.21      0.62     12.34  95264.82           6         8
024  96316.15      0.64    240.94  95266.42         207        21
025  95904.91     91.46   5084.96  90099.86        5555     10934
026  96228.61     40.88   3456.83  91922.99        3103      5723
027  95950.10     35.75   2496.52  92745.05        1699      3342
028  96313.34      0.50    234.11  95270.38         201         9
029  96007.35     24.94   1828.59  93423.95        1094      2150
030  96306.57      0.58    224.77  95273.88         201         8
031  96316.56      0.55    233.54  95274.02         201         8
032  96308.62      0.53    226.80  95273.84         201         8
033  96306.44      0.28    225.13  95273.72         201         8
034  96308.25      0.43    226.58  95273.68         201         8
035  96301.46      0.27    220.84  95273.31         201         8
036  96312.55      0.00    227.14  95277.79         201         8
037  96307.82      0.38    227.93  95272.00         201         8
038  96252.11     22.94   1327.33  94094.32         887      1357
039  95880.80      0.56      3.84  95273.40           6         8
040  95926.00      0.33      5.18  95272.28           6         8
041  96253.65      4.66    488.09  94953.53         339       280
042  96031.55      2.24    222.66  95052.84          88       170
043  96059.78      0.33     16.26  95261.17           6         8
044  96084.04      0.98     26.12  95250.66           9        15
045  96252.50      0.24    183.68  95261.33         201         8
046  96248.62      0.34    174.69  95266.44         201         8
047  95801.45      0.39      4.25  95273.11           6         8
---  --------  --------  --------  --------    --------  --------
TOT  4617449.62  68969.02  178714.19  4332117.30     2168726   4255570

MC#    Total      I/O      MsgQ      WaitQ      Buffers   MBytes
---  --------  --------  --------  --------    --------  --------
000  95277.43  64668.67  30589.76      0.04      120080    236874
001  95277.43  64659.83  30595.47      3.67      116919    230639
002  95277.32  64529.22  30720.80      3.63      117463    231712
003  95277.41  68329.22  26917.89      3.60      108866    214753
004  95277.95  64615.43  30632.83      3.64      122058    240776
005  95277.49  64674.19  30570.68      3.63      121181    239046
006  95277.45  67338.96  27904.16      3.67      139619    275418
007  95277.39  68281.24  26960.68      3.60      112152    221235
008  95277.40  68443.99  26794.69      3.60      107810    212670
009  95277.58  67631.46  27608.40      3.61      126191    248929
010  95277.70  64729.29  30524.33      3.60      119820    236361
011  95277.40  67777.43  27455.47      3.61      123743    244100
012  95277.95  64955.30  30296.97      3.63      105628    208365
013  95277.70  67547.67  27683.27      3.60      127330    251176
014  95277.79  67298.99  27929.84      3.60      138073    272368
015  95277.42  64641.54  30595.10      3.63      110127    217240
016  95277.79  64658.73  30576.14      3.61      108454    213940
017  96326.16  67573.98  28691.59      4.13      143231    282539
---  --------  --------  --------  --------    --------  --------
TOT  1716044.85  1192355.24  523048.17     62.20     2168745   4278149

2021-03-17-14.05.22.900337+060 E427492A530          LEVEL: Info
PID     : 4718886              TID : 43695          PROC : db2sysc 0
INSTANCE: db2prd               NODE : 000           DB   : PRD
APPHDL  : 0-5590               APPID: *LOCAL.db2prd.210316101954
AUTHID  : DB2PRD               HOSTNAME: lpdd001
EDUID   : 43695                EDUNAME: db2agent (PRD) 0
FUNCTION: DB2 UDB, database utilities, sqluxLogDataStats, probe:421
MESSAGE : Estimated size of backup in bytes:
DATA #1 : unsigned integer, 8 bytes
4462864068608
 

Badge +2

Hi All,

 

In the DB2SBT.log, i also see a lot of messages about waiting for pipeline buffer, this occurs every few minutes:
15859976 1   03/18 09:06:38 4936695 [PIPELAYER  ] ERROR: Waiting for pipeline buf. SndIdx [0/2], RCId [46325766]
8979020 1   03/18 09:07:14 4936695 [PIPELAYER  ] ERROR: Waiting for pipeline buf. SndIdx [1/2], RCId [46325764]
20578730 1   03/18 09:08:59 4936695 [PIPELAYER  ] ERROR: Waiting for pipeline buf. SndIdx [0/2], RCId [46325763]
11010616 1   03/18 09:09:47 4936695 [PIPELAYER  ] ERROR: Waiting for pipeline buf. SndIdx [1/2], RCId [46325760]
14942968 1   03/18 09:09:50 4936695 [PIPELAYER  ] ERROR: Waiting for pipeline buf. SndIdx [0/2], RCId [46325756]
20185514 1   03/18 09:10:43 4936695 [PIPELAYER  ] ERROR: Waiting for pipeline buf. SndIdx [0/2], RCId [46325757]
 

See attached db2sbt.log

​​​​​​KR,

Marcel Huisman

Userlevel 6
Badge +14

@Marcel Huisman , following up on this to ensure you’re getting this resolved.  Might be best at this point to open an incident (unless anyone else has some input).

Userlevel 6
Badge +14

@Marcel Huisman , following up to see how this is looking.  I checked your e-mail address for incidents and am not seeing anything new.

Thanks!

Badge +2

Thanks for your reply Mike,

My colleague started a case through our Commvault reseller ProAct. backup runs now over 30 hours. IBM is also working on the case from DB2 perspective.. Try to keep you posted..

Marcel

 

Userlevel 6
Badge +14

Thanks, @Marcel Huisman !  can you pm me the case number?

Reply