Oracle诊断案例----如何捕获问题SQL解决过度CPU消耗问题

Oracle诊断案例----如何捕获问题SQL解决过度CPU消耗问题

--使用vmstat,top等辅助解决Oracle数据库性能问题

Last Updated: Sunday, 2004-10-24 0:37 Eygle


|

问题描述:
开发人员报告系统运行缓慢,影响用户访问.

1.登陆数据库主机

使用vmstat检查,发现CPU资源已经耗尽,大量任务位于运行队列:

|

> >
>
>
> bash-2.03$ vmstat 3 > procs memory page disk faults cpu > r b w swap free re mf pi po fr de sr s6 s9 s1 sd in sy cs us sy id > 0 0 0 5504232 1464112 0 0 0 0 0 0 0 0 1 1 0 4294967196 0 0 -84 -5 -145 > 131 0 0 5368072 1518360 56 691 0 2 2 0 0 0 1 0 0 3011 7918 2795 97 3 0 > 131 0 0 5377328 1522464 81 719 0 2 2 0 0 0 1 0 0 2766 8019 2577 96 4 0 > 130 0 0 5382400 1524776 67 682 0 0 0 0 0 0 0 0 0 3570 8534 3316 97 3 0 > 134 0 0 5373616 1520512 127 1078 0 2 2 0 0 0 1 0 0 3838 9584 3623 96 4 0 > 136 0 0 5369392 1518496 107 924 0 5 5 0 0 0 0 0 0 2920 8573 2639 97 3 0 > 132 0 0 5364912 1516224 63 578 0 0 0 0 0 0 0 0 0 3358 7944 3119 97 3 0 > 129 0 0 5358648 1511712 189 1236 0 0 0 0 0 0 0 0 0 3366 10365 3135 95 5 0 > 129 0 0 5354528 1511304 120 1194 0 0 0 0 0 0 0 4 0 3235 8864 2911 96 4 0 > 128 0 0 5346848 1507704 99 823 0 0 0 0 0 0 0 3 0 3189 9048 3074 96 4 0 > 125 0 0 5341248 1504704 80 843 0 2 2 0 0 0 6 1 0 3563 9514 3314 95 5 0 > 133 0 0 5332744 1501112 79 798 0 0 0 0 0 0 0 1 0 3218 8805 2902 97 3 0 > 129 0 0 5325384 1497368 107 643 0 2 2 0 0 0 1 4 0 3184 8297 2879 96 4 0 > 126 0 0 5363144 1514320 81 753 0 0 0 0 0 0 0 0 0 2533 7409 2164 97 3 0 > 136 0 0 5355624 1510512 169 566 786 0 0 0 0 0 0 1 0 3002 8600 2810 96 4 0 > 130 1 0 5351448 1502936 267 580 1821 0 0 0 0 0 0 0 0 3126 7812 2900 96 4 0 > 129 0 0 5347256 1499568 155 913 2 2 2 0 0 0 0 1 0 2225 8076 1941 98 2 0 > 116 0 0 5338192 1495400 177 1162 0 0 0 0 0 0 0 1 0 1947 7781 1639 97 3 0 >


2.使用Top命令

观察进程CPU耗用,发现没有明显过高CPU使用的进程
$ top

> > last pid: 28313; load averages: 99.90, 117.54, 125.71 23:28:38 > 296 processes: 186 sleeping, 99 running, 2 zombie, 9 on cpu > CPU states: 0.0% idle, 96.5% user, 3.5% kernel, 0.0% iowait, 0.0% swap > Memory: 4096M real, 1404M free, 2185M swap in use, 5114M swap free >
> PID USERNAME THR PRI NICE SIZE RES STATE TIME CPU COMMAND > 27082 oracle8i 1 33 0 1328M 1309M run 0:17 1.29% oracle > 26719 oracle8i 1 55 0 1327M 1306M sleep 0:29 1.11% oracle > 28103 oracle8i 1 35 0 1327M 1304M run 0:06 1.10% oracle > 28161 oracle8i 1 25 0 1327M 1305M run 0:04 1.10% oracle > 26199 oracle8i 1 45 0 1328M 1309M run 0:42 1.10% oracle > 26892 oracle8i 1 33 0 1328M 1310M run 0:24 1.09% oracle > 27805 oracle8i 1 45 0 1327M 1306M cpu/1 0:10 1.04% oracle > 23800 oracle8i 1 23 0 1327M 1306M run 1:28 1.03% oracle > 25197 oracle8i 1 34 0 1328M 1309M run 0:57 1.03% oracle > 21593 oracle8i 1 33 0 1327M 1306M run 2:12 1.01% oracle > 27616 oracle8i 1 45 0 1329M 1311M run 0:14 1.01% oracle > 27821 oracle8i 1 43 0 1327M 1306M run 0:10 1.00% oracle > 26517 oracle8i 1 33 0 1328M 1309M run 0:33 0.97% oracle > 25785 oracle8i 1 44 0 1328M 1309M run 0:46 0.96% oracle > 26241 oracle8i 1 45 0 1327M 1306M run 0:42 0.96% oracle >
>


3.检查进程数量

> > bash-2.03$ ps -ef|grep ora|wc -l > 258 > bash-2.03$ ps -ef|grep ora|wc -l > 275 > bash-2.03$ ps -ef|grep ora|wc -l > 274 > bash-2.03$ ps -ef|grep ora|wc -l > 278 > bash-2.03$ ps -ef|grep ora|wc -l > 277 > bash-2.03$ ps -ef|grep ora|wc -l > 366 >


发现系统存在大量Oracle进程,大约在300左右,而正常情况下Oracle连接数应该在100左右.

4.检查数据库

查询v$session_wait获取各进程等待事件

> >   > SQL> select sid,event,p1,p1text from v$session_wait; >
> SID EVENT P1 P1TEXT > ---------- ------------------------------ ---------- ---------------------------------------------------------------- > 124 latch free 1.6144E+10 address > 1 pmon timer 300 duration > 2 rdbms ipc message 300 timeout > 3 rdbms ipc message 300 timeout > 11 rdbms ipc message 30000 timeout > 6 rdbms ipc message 180000 timeout > 4 rdbms ipc message 300 timeout > 134 rdbms ipc message 6000 timeout > 147 rdbms ipc message 6000 timeout > 275 rdbms ipc message 17995 timeout > 274 rdbms ipc message 6000 timeout >
> SID EVENT P1 P1TEXT > ---------- ------------------------------ ---------- ---------------------------------------------------------------- > 118 rdbms ipc message 6000 timeout > 7 buffer busy waits 17 file# > 56 buffer busy waits 17 file# > 161 buffer busy waits 17 file# > 195 buffer busy waits 17 file# > 311 buffer busy waits 17 file# > 314 buffer busy waits 17 file# > 205 buffer busy waits 17 file# > 269 buffer busy waits 17 file# > 200 buffer busy waits 17 file# > 164 buffer busy waits 17 file# >
> SID EVENT P1 P1TEXT > ---------- ------------------------------ ---------- ---------------------------------------------------------------- > 140 buffer busy waits 17 file# > 66 buffer busy waits 17 file# > 10 db file sequential read 17 file# > 18 db file sequential read 17 file# > 54 db file sequential read 17 file# > 49 db file sequential read 17 file# > 48 db file sequential read 17 file# > 46 db file sequential read 17 file# > 45 db file sequential read 17 file# > 35 db file sequential read 17 file# > 30 db file sequential read 17 file# >
> SID EVENT P1 P1TEXT > ---------- ------------------------------ ---------- ---------------------------------------------------------------- > 29 db file sequential read 17 file# > 22 db file sequential read 17 file# > 178 db file sequential read 17 file# > 175 db file sequential read 17 file# > 171 db file sequential read 17 file# > 123 db file sequential read 17 file# > 121 db file sequential read 17 file# > 120 db file sequential read 17 file# > 117 db file sequential read 17 file# > 114 db file sequential read 17 file# > 113 db file sequential read 17 file# >
> SID EVENT P1 P1TEXT > ---------- ------------------------------ ---------- ---------------------------------------------------------------- > 111 db file sequential read 17 file# > 107 db file sequential read 17 file# > 80 db file sequential read 17 file# > 222 db file sequential read 17 file# > 218 db file sequential read 17 file# > 216 db file sequential read 17 file# > 213 db file sequential read 17 file# > 199 db file sequential read 17 file# > 198 db file sequential read 17 file# > 194 db file sequential read 17 file# > 192 db file sequential read 17 file# >
> SID EVENT P1 P1TEXT > ---------- ------------------------------ ---------- ---------------------------------------------------------------- > 188 db file sequential read 17 file# > 249 db file sequential read 17 file# > 242 db file sequential read 17 file# > 239 db file sequential read 17 file# > 236 db file sequential read 17 file# > 235 db file sequential read 17 file# > 234 db file sequential read 17 file# > 233 db file sequential read 17 file# > 230 db file sequential read 17 file# > 227 db file sequential read 17 file# > 336 db file sequential read 17 file# >
> SID EVENT P1 P1TEXT > ---------- ------------------------------ ---------- ---------------------------------------------------------------- > 333 db file sequential read 17 file# > 331 db file sequential read 17 file# > 329 db file sequential read 17 file# > 327 db file sequential read 17 file# > 325 db file sequential read 17 file# > 324 db file sequential read 17 file# > 320 db file sequential read 17 file# > 318 db file sequential read 17 file# > 317 db file sequential read 17 file# > 316 db file sequential read 17 file# > 313 db file sequential read 17 file# >
> SID EVENT P1 P1TEXT > ---------- ------------------------------ ---------- ---------------------------------------------------------------- > 305 db file sequential read 17 file# > 303 db file sequential read 17 file# > 301 db file sequential read 17 file# > 293 db file sequential read 17 file# > 290 db file sequential read 17 file# > 288 db file sequential read 17 file# > 287 db file sequential read 17 file# > 273 db file sequential read 17 file# > 271 db file sequential read 17 file# > 257 db file sequential read 17 file# > 256 db file sequential read 17 file# >
> SID EVENT P1 P1TEXT > ---------- ------------------------------ ---------- ---------------------------------------------------------------- > 254 db file sequential read 17 file# > 252 db file sequential read 17 file# > 159 db file sequential read 17 file# > 153 db file sequential read 17 file# > 146 db file sequential read 17 file# > 142 db file sequential read 17 file# > 135 db file sequential read 17 file# > 133 db file sequential read 17 file# > 132 db file sequential read 17 file# > 126 db file sequential read 17 file# > 79 db file sequential read 17 file# >
> SID EVENT P1 P1TEXT > ---------- ------------------------------ ---------- ---------------------------------------------------------------- > 77 db file sequential read 17 file# > 72 db file sequential read 17 file# > 70 db file sequential read 17 file# > 69 db file sequential read 17 file# > 67 db file sequential read 17 file# > 63 db file sequential read 17 file# > 55 db file sequential read 17 file# > 102 db file sequential read 17 file# > 96 db file sequential read 17 file# > 95 db file sequential read 17 file# > 91 db file sequential read 17 file# >
> SID EVENT P1 P1TEXT > ---------- ------------------------------ ---------- ---------------------------------------------------------------- > 81 db file sequential read 17 file# > 15 db file sequential read 17 file# > 19 db file scattered read 17 file# > 50 db file scattered read 17 file# > 285 db file scattered read 17 file# > 279 db file scattered read 17 file# > 255 db file scattered read 17 file# > 243 db file scattered read 17 file# > 196 db file scattered read 17 file# > 187 db file scattered read 17 file# > 170 db file scattered read 17 file# >
> SID EVENT P1 P1TEXT > ---------- ------------------------------ ---------- ---------------------------------------------------------------- > 162 db file scattered read 17 file# > 138 db file scattered read 17 file# > 110 db file scattered read 17 file# > 108 db file scattered read 17 file# > 92 db file scattered read 17 file# > 330 db file scattered read 17 file# > 310 db file scattered read 17 file# > 302 db file scattered read 17 file# > 299 db file scattered read 17 file# > 89 db file scattered read 17 file# > 5 smon timer 300 sleep time >
> SID EVENT P1 P1TEXT > ---------- ------------------------------ ---------- ---------------------------------------------------------------- > 20 SQLNet message to client 1952673792 driver id > 103 SQLNet message to client 1650815232 driver id > .... > 148 SQLNet more data from client 1952673792 driver id > 291 SQLNet more data from client 1952673792 driver id >
> 244 rows selected.


发现存在大量db file scattered read及db file sequential read等待.

5.捕获相关SQL

这里用到了我的以下脚本getsqlbysid.sql:

> > SELECT sql_text > FROM v$sqltext a > WHERE a.hash_value = (SELECT sql_hash_value > FROM v$session b > WHERE b.SID = '&sid') > ORDER BY piece ASC > /


> >   > SQL> @getsql > Enter value for sid: 18 > old 5: where b.sid='&sid' > new 5: where b.sid='18' >
> SQL_TEXT > ---------------------------------------------------------------- > select i.vc2title,i.numinfoguid from hs_info i where i.intenab > ledflag = 1 and i.intpublishstate = 1 and i.datpublishdate <= > sysdate and i.numcatalogguid = 2047 order by i.datpublishdate d > esc, i.numorder desc >
> SQL> / > Enter value for sid: 54 > old 5: where b.sid='&sid' > new 5: where b.sid='54' >
> SQL_TEXT > ---------------------------------------------------------------- > select i.vc2title,i.numinfoguid from hs_info i where i.intenab > ledflag = 1 and i.intpublishstate = 1 and i.datpublishdate <= > sysdate and i.numcatalogguid = 33 order by i.datpublishdate des > c, i.numorder desc >
> SQL> / > Enter value for sid: 49 > old 5: where b.sid='&sid' > new 5: where b.sid='49' >
> SQL_TEXT > ---------------------------------------------------------------- > select i.vc2title,i.numinfoguid from hs_info i where i.intenab > ledflag = 1 and i.intpublishstate = 1 and i.datpublishdate <= > sysdate and i.numcatalogguid = 26 order by i.datpublishdate des > c, i.numorder desc >


对几个全表扫描进程跟踪以后,得到以上SQL语句.
以上语句如果良好编码应该使用绑定变量.但是现在这个不是我们关心的.

使用该应用用户连接,检查其执行计划:

> > SQL> set autotrace trace explain > SQL> select i.vc2title,i.numinfoguid
> 2 from hs_info i where i.intenabledflag = 1
> 3 and i.intpublishstate = 1 and i.datpublishdate <=sysdate
> 4 and i.numcatalogguid = 3475 > 5 order by i.datpublishdate desc, i.numorder desc ; >
> Execution Plan > ---------------------------------------------------------- > 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=228 Card=1 Bytes=106) > 1 0 SORT (ORDER BY) (Cost=228 Card=1 Bytes=106) > 2 1 TABLE ACCESS (FULL) OF 'HS_INFO' (Cost=218 Card=1 Bytes=106) >
> SQL> select count() from hs_info; >
> COUNT(
) > ---------- > 227404 >


该表这里有22万记录,全表扫描已经不再适合.

检查该表,存在以下索引:

> > SQL> select index_name,index_type from user_indexes where table_name='HS_INFO'; >
> INDEX_NAME INDEX_TYPE > ------------------------------ --------------------------- > HSIDX_INFO1 FUNCTION-BASED NORMAL > HSIDX_INFO_SEARCHKEY DOMAIN > PK_HS_INFO NORMAL >
> 检查索引键值: >
> SQL> select index_name,column_name from user_ind_columns where table_name ='HS_INFO'; >
> INDEX_NAME COLUMN_NAME > ------------------------------ -------------------- > HSIDX_INFO1 NUMORDER > HSIDX_INFO1 SYS_NC00024$ > HSIDX_INFO_SEARCHKEY VC2INDEXWORDS > PK_HS_INFO NUMINFOGUID >
> SQL> desc hs_info > Name Null? Type > ----------------------------------------------------------------- -------- --------------------------- > NUMINFOGUID NOT NULL NUMBER(15) > NUMCATALOGGUID NOT NULL NUMBER(15) > INTTEXTTYPE NOT NULL NUMBER(38) > VC2TITLE NOT NULL VARCHAR2(60) > VC2AUTHOR VARCHAR2(100) > NUMPREVINFOGUID NUMBER(15) > NUMNEXTINFOGUID NUMBER(15) > NUMORDER NOT NULL NUMBER(15) > DATPUBLISHDATE NOT NULL DATE > INTPUBLISHSTATE NOT NULL NUMBER(38) > VC2PUBLISHERID VARCHAR2(30) > VC2INDEXWORDS VARCHAR2(200) > VC2WAPPREVPATH VARCHAR2(200) > VC2WEBPREVPATH VARCHAR2(200) > VC2WAP2PREVPATH VARCHAR2(200) > NUMVISITED NOT NULL NUMBER(15) > INTENABLEDFLAG NOT NULL NUMBER(38) > DATCREATETIME NOT NULL DATE > DATMODIFYTIME NOT NULL DATE > VC2NOTES VARCHAR2(1000) > INTINFOTYPE NOT NULL NUMBER(38) > VC2PRIZEFLAG VARCHAR2(1) > VC2DESC VARCHAR2(1000)


6.决定创建新的索引以消除全表扫描

> >   > SQL> create index hs_info_NUMCATALOGGUID on hs_info(NUMCATALOGGUID); >
> Index created. >
> SQL> set autotrace trace explain > SQL> select i.vc2title,i.numinfoguid
> 2 from hs_info i where i.intenabledflag = 1
> 3 and i.intpublishstate = 1 and i.datpublishdate <=sysdate
> 4 and i.numcatalogguid = 3475 > 5 order by i.datpublishdate desc, i.numorder desc ; >
> Execution Plan > ---------------------------------------------------------- > 0 SELECT STATEMENT Optimizer=CHOOSE (Cost=12 Card=1 Bytes=106) > 1 0 SORT (ORDER BY) (Cost=12 Card=1 Bytes=106) > 2 1 TABLE ACCESS (BY INDEX ROWID) OF 'HS_INFO' (Cost=2 Card=1 Bytes=106) > 3 2 INDEX (RANGE SCAN) OF 'HS_INFO_NUMCATALOGGUID' (NON-UNIQUE) (Cost=1 Card=1)
>
>


7.观察系统状况

原大量等待消失

> >   > SQL> select sid,event,p1,p1text from v$session_wait where event not like 'SQL%'; >
> SID EVENT P1 P1TEXT > ---------- ------------------------------ ---------- ---------------------------------------------------------------- > 1 pmon timer 300 duration > 2 rdbms ipc message 300 timeout > 3 rdbms ipc message 300 timeout > 6 rdbms ipc message 180000 timeout > 59 rdbms ipc message 6000 timeout > 118 rdbms ipc message 6000 timeout > 275 rdbms ipc message 30000 timeout > 147 rdbms ipc message 6000 timeout > 62 rdbms ipc message 6000 timeout > 11 rdbms ipc message 30000 timeout > 4 rdbms ipc message 300 timeout >
> SID EVENT P1 P1TEXT > ---------- ------------------------------ ---------- ---------------------------------------------------------------- > 305 db file sequential read 17 file# > 356 db file sequential read 17 file# > 19 db file scattered read 17 file# > 5 smon timer 300 sleep time >
> 15 rows selected.
>


持续观察的CPU使用情况

> >   > bash-2.03$ vmstat 3 > procs memory page disk faults cpu > r b w swap free re mf pi po fr de sr s6 s9 s1 sd in sy cs us sy id > 20 0 0 5421792 1503488 38 434 136 0 0 0 0 0 0 2 0 2931 7795 2622 91 9 0 > 23 1 0 5416080 1500632 95 734 56 0 0 0 0 0 0 0 0 2949 8057 2598 89 11 0 > 26 0 0 5412016 1498480 210 1170 21 5 5 0 0 0 2 1 0 3301 9647 3116 90 10 0 > 25 0 0 5394912 1490160 242 1606 56 0 0 0 0 0 0 1 0 3133 9318 2850 89 11 0 > 40 0 0 5390200 1488112 162 1393 66 0 0 0 0 0 0 0 0 2848 9080 2502 90 10 0 > 40 0 0 5377120 1481792 136 1180 120 2 2 0 0 0 1 1 0 2846 9099 2593 92 8 0 > 36 0 0 5363216 1475168 134 1169 53 0 0 0 0 0 3 2 0 2871 7989 2621 88 12 0 > 39 0 0 5348936 1469160 157 1448 210 0 0 0 0 0 0 0 0 3660 10062 3480 88 12 0 > 35 0 0 5344552 1466472 7 15 56 0 0 0 0 0 0 0 0 2885 7663 2635 92 8 0 > 34 0 0 5343016 1465416 44 386 77 0 0 0 0 0 0 0 0 3197 8486 2902 92 8 0 > 31 0 0 5331568 1459696 178 1491 122 0 0 0 0 0 0 3 0 3237 9461 3005 89 11 0 > 31 0 0 5317792 1453008 76 719 80 0 0 0 0 0 0 0 0 3292 8736 3025 93 7 0 > 31 2 0 5311144 1449552 235 1263 69 2 2 0 0 0 1 0 0 3473 9535 3357 88 12 0 > 25 0 0 5300240 1443920 108 757 18 2 2 0 0 0 1 1 0 2377 7876 2274 95 5 0 > 19 0 0 5295904 1441840 50 377 0 0 0 0 0 0 0 1 0 1915 6598 1599 98 1 0 > ----以上为创建索引之前部分 > ----以下为创建索引之后部分,CPU使用率恢复正常 > procs memory page disk faults cpu > r b w swap free re mf pi po fr de sr s6 s9 s1 sd in sy cs us sy id > 40 1 0 5290040 1439208 315 3894 8 2 2 0 0 0 1 6 0 3631 13414 5206 61 9 30 > 0 1 0 5237192 1414744 731 6749 45 0 0 0 0 0 2 7 0 3264 13558 4941 52 14 34 > 0 0 0 5163632 1380608 747 6585 10 0 0 0 0 0 0 1 0 2617 12291 3901 46 12 41 > 1 0 0 5090224 1348152 712 6079 29 0 0 0 0 0 0 6 0 2825 12416 4178 50 12 39 > 1 0 0 5023672 1317296 714 6183 24 0 0 0 0 0 0 5 0 3166 12424 4745 47 13 40 > 0 0 0 4955872 1287136 737 6258 16 0 0 0 0 0 0 3 0 2890 11777 4432 44 12 44 > 1 0 0 4887888 1256464 809 6234 8 2 2 0 0 0 0 2 0 2809 12066 4247 45 12 43 > 0 0 0 4828912 1228200 312 2364 13 5 5 0 0 0 2 1 0 2410 6816 3492 38 6 57 > 0 0 0 4856816 1240168 8 138 0 0 0 0 0 0 1 0 0 2314 4026 3232 34 4 62 > 0 0 0 4874176 1247712 0 86 0 0 0 0 0 0 0 0 0 2298 3930 3324 35 2 63 > 2 0 0 4926088 1270824 34 560 0 0 0 0 0 0 0 0 0 2192 4694 2612 29 16 55 > 0 0 0 5427320 1512952 53 694 0 0 0 0 0 0 3 2 0 2443 5085 3340 33 12 55 > 0 0 0 5509120 1553136 0 37 0 0 0 0 0 0 0 0 0 2309 3908 3321 35 1 64 > 0 0 0 5562048 1577000 16 234 0 0 0 0 0 0 0 0 0 2507 5187 3433 35 8 57 > 0 0 0 5665672 1623848 252 1896 8 2 2 0 0 0 1 0 0 2091 6548 2939 34 5 61 > 0 0 0 5654752 1618208 5 173 16 0 0 0 0 0 0 0 0 2226 4218 3051 35 4 60 > 0 0 0 5727024 1651120 28 254 0 0 0 0 0 0 0 0 0 2126 4224 2982 38 2 60 > 0 0 0 5723184 1648880 93 562 8 2 2 0 0 0 1 1 0 2371 5140 3432 38 3 59 > 0 0 0 5730744 1652512 7 177 26 2 2 0 0 0 1 0 0 2465 4442 3575 36 3 61


至此,此问题得以解决.

本文作者:
eygle,Oracle技术关注者,来自中国最大的Oracle技术论坛 itpub .
www.eygle.com 是作者的个人站点.你可通过 [email protected] 来联系作者.欢迎技术探讨交流以及链接交换.


原文出处:

http://www.eygle.com/case/How.To.Capture.Problem.SQL.htm


Published At
Categories with 数据库类
Tagged with
comments powered by Disqus