Oracle诊断案例-SGA与Swap之一


link:

http://www.eygle.com/case/sga1.htm

案例描述:

用户报告,服务器启动一段时间以后,无法建立数据库连接
重新启动几分钟以后,再次无法连接

系统无法正常使用.

1.登陆系统

SunOS 5.8

login: root
Password:
Last login: Tue Mar 23 13:56:59 from 172.16.31.41
Sun Microsystems Inc. SunOS 5.8 Generic Patch October 2001
You have new mail.

2.su 为Oracle用户
检查启动的Oracle进程

发现后台进程正常,有一定量的用户连接

wapplatform:/>su - oracle
Sun Microsystems Inc. SunOS 5.8 Generic Patch October 2001
You have new mail.
/export/home1/oracle>ls
admin codesyndealt31 exp.sh local.cshrc local.profile oraclebak oui v6_database
app exp.log jre local.login nsmail oradata swan
export/home1/oracle>cd admin
/export/home1/oracle/admin>ps -ef|grep ora
oracle 25269 25258 0 13:58:36 pts/3 0:00 grep ora
oracle 25257 24906 0 13:58:31 pts/4 0:00 vi alert_HSWAPDB.log
oracle 25267 1 1 13:58:34 ? 0:00 oracleHSWAPDB (LOCAL=NO)
oracle 25184 1 0 13:56:57 ? 0:00 ora_p007_HSWAPDB
oracle 25182 1 0 13:56:57 ? 0:00 ora_p006_HSWAPDB
oracle 25193 1 0 13:57:03 ? 0:01 oracleHSWAPDB (LOCAL=NO)
oracle 25209 1 0 13:57:09 ? 0:00 oracleHSWAPDB (LOCAL=NO)
oracle 25176 1 0 13:56:57 ? 0:00 ora_p003_HSWAPDB
oracle 25180 1 0 13:56:57 ? 0:00 ora_p005_HSWAPDB
oracle 25172 1 0 13:56:56 ? 0:00 ora_p001_HSWAPDB
oracle 25178 1 0 13:56:57 ? 0:00 ora_p004_HSWAPDB
oracle 25170 1 0 13:56:56 ? 0:00 ora_p000_HSWAPDB
oracle 24254 24240 0 12:08:25 pts/2 0:00 -ksh
oracle 25174 1 0 13:56:56 ? 0:00 ora_p002_HSWAPDB
oracle 25244 1 1 13:58:23 ? 0:00 oracleHSWAPDB (LOCAL=NO)
oracle 25218 1 0 13:57:23 ? 0:00 oracleHSWAPDB (LOCAL=NO)
oracle 25159 1 0 13:56:42 ? 0:02 ora_qmn0_HSWAPDB
oracle 25230 1 0 13:57:40 ? 0:01 oracleHSWAPDB (LOCAL=NO)
oracle 25161 1 0 13:56:42 ? 0:00 ora_s000_HSWAPDB
oracle 25149 1 0 13:56:41 ? 0:01 ora_lgwr_HSWAPDB
oracle 25157 1 0 13:56:42 ? 0:00 ora_cjq0_HSWAPDB
oracle 24906 3698 0 13:47:47 pts/4 0:00 -ksh
oracle 25153 1 0 13:56:42 ? 0:01 ora_smon_HSWAPDB
oracle 25058 7464 0 13:55:14 pts/1 0:00 -ksh
oracle 25163 1 0 13:56:42 ? 0:00 ora_d000_HSWAPDB
oracle 25155 1 0 13:56:42 ? 0:00 ora_reco_HSWAPDB
oracle 25151 1 0 13:56:41 ? 0:00 ora_ckpt_HSWAPDB
oracle 25145 1 0 13:56:41 ? 0:00 ora_dbw0_HSWAPDB
oracle 25199 1 15 13:57:04 ? 0:49 ora_j000_HSWAPDB
oracle 4149 4146 0 12:05:11 pts/5 0:00 -ksh
oracle 25232 1 0 13:57:41 ? 0:00 oracleHSWAPDB (LOCAL=NO)
oracle 25119 1 0 13:56:29 ? 0:00 oraclehswapdb (LOCAL=NO)
oracle 25075 1 0 13:55:34 ? 0:00 /export/home1/oracle/app/bin/tnslsnr LISTENER -inherit
oracle 24374 4149 0 12:21:56 pts/5 0:00 sqlplus /nolog
oracle 25143 1 0 13:56:41 ? 0:00 ora_pmon_HSWAPDB
oracle 25258 25242 0 13:58:31 pts/3 0:00 -ksh
/export/home1/oracle/admin>ps -ef|grep ora_
oracle 25275 25258 0 13:58:42 pts/3 0:00 grep ora_
oracle 25184 1 0 13:56:57 ? 0:00 ora_p007_HSWAPDB
oracle 25182 1 0 13:56:57 ? 0:00 ora_p006_HSWAPDB
oracle 25176 1 0 13:56:57 ? 0:00 ora_p003_HSWAPDB
oracle 25180 1 0 13:56:57 ? 0:00 ora_p005_HSWAPDB
oracle 25172 1 0 13:56:56 ? 0:00 ora_p001_HSWAPDB
oracle 25178 1 0 13:56:57 ? 0:00 ora_p004_HSWAPDB
oracle 25170 1 0 13:56:56 ? 0:00 ora_p000_HSWAPDB
oracle 25174 1 0 13:56:56 ? 0:00 ora_p002_HSWAPDB
oracle 25159 1 0 13:56:42 ? 0:02 ora_qmn0_HSWAPDB
oracle 25161 1 0 13:56:42 ? 0:00 ora_s000_HSWAPDB
oracle 25149 1 0 13:56:41 ? 0:01 ora_lgwr_HSWAPDB
oracle 25157 1 0 13:56:42 ? 0:00 ora_cjq0_HSWAPDB
oracle 25153 1 0 13:56:42 ? 0:01 ora_smon_HSWAPDB
oracle 25163 1 0 13:56:42 ? 0:00 ora_d000_HSWAPDB
oracle 25155 1 0 13:56:42 ? 0:00 ora_reco_HSWAPDB
oracle 25151 1 0 13:56:41 ? 0:00 ora_ckpt_HSWAPDB
oracle 25145 1 0 13:56:41 ? 0:00 ora_dbw0_HSWAPDB
oracle 25199 1 13 13:57:04 ? 0:51 ora_j000_HSWAPDB
oracle 25143 1 0 13:56:41 ? 0:00 ora_pmon_HSWAPDB

3.检查Alert.log警报日志文件

/export/home1/oracle/admin>ls
hswapdb
/export/home1/oracle/admin>cd *
/export/home1/oracle/admin/hswapdb>ls
bdump cdump create pfile udump
/export/home1/oracle/admin/hswapdb>cd bdump
/export/home1/oracle/admin/hswapdb/bdump>

/export/home1/oracle/admin/hswapdb/bdump>ls -l *.log

-rw-r--r-- 1 oracle dba 813396 Mar 23 13:57 alert_HSWAPDB.log
/export/home1/oracle/admin/hswapdb/bdump>vi *.log
"alert_HSWAPDB.log" 18888 lines, 813396 characters (115 null)
Tue Jun 24 21:17:14 2003
Starting ORACLE instance (normal)
LICENSE_MAX_SESSION = 0
LICENSE_SESSIONS_WARNING = 0
SCN scheme 3
Using log_archive_dest parameter default value
LICENSE_MAX_USERS = 0
SYS auditing is disabled
Starting up ORACLE RDBMS Version: 9.2.0.3.0.
System parameters with non-default values:
processes = 400
timed_statistics = TRUE
shared_pool_size = 117440512
large_pool_size = 83886080
java_pool_size = 33554432
control_files = /export/home1/oracle/oradata/hswapdb/control01.ctl,

/export/home1/oracle/oradata/hswapdb/control02.ctl,
/export/home1/oracle/oradata/hswapdb/control03.ctl
db_block_size = 8192
db_cache_size = 352321536
compatible = 9.2.0.0.0
db_file_multiblock_read_count= 16
fast_start_mttr_target = 300
undo_management = AUTO
undo_tablespace = UNDOTBS1
undo_retention = 10800
remote_login_passwordfile= EXCLUSIVE
db_domain = eygle.com
instance_name = hswapdb
dispatchers = (PROTOCOL=TCP) (SERVICE=hswapdbXDB)
job_queue_processes = 10
hash_join_enabled = TRUE
background_dump_dest = /export/home1/oracle/admin/hswapdb/bdump
user_dump_dest = /export/home1/oracle/admin/hswapdb/udump
core_dump_dest = /export/home1/oracle/admin/hswapdb/cdump
sort_area_size = 524288
db_name = hswapdb
open_cursors = 300
star_transformation_enabled= FALSE
query_rewrite_enabled = FALSE
pga_aggregate_target = 154140672
aq_tm_processes = 1

.................

Tue Mar 23 13:40:45 2004
skgpspawn failed:category = 27142, depinfo = 12, op = fork, loc = skgpspawn3
skgpspawn failed:category = 27142, depinfo = 12, op = fork, loc = skgpspawn3
skgpspawn failed:category = 27142, depinfo = 12, op = fork, loc = skgpspawn3
skgpspawn failed:category = 27142, depinfo = 12, op = fork, loc = skgpspawn3
skgpspawn failed:category = 27142, depinfo = 12, op = fork, loc = skgpspawn3
skgpspawn failed:category = 27142, depinfo = 12, op = fork, loc = skgpspawn3
skgpspawn failed:category = 27142, depinfo = 11, op = fork, loc = skgpspawn5
skgpspawn failed:category = 27142, depinfo = 12, op = fork, loc = skgpspawn3
skgpspawn failed:category = 27142, depinfo = 12, op = fork, loc = skgpspawn3
Tue Mar 23 13:42:02 2004
skgpspawn failed:category = 27142, depinfo = 12, op = fork, loc = skgpspawn3
skgpspawn failed:category = 27142, depinfo = 12, op = fork, loc = skgpspawn3
skgpspawn failed:category = 27142, depinfo = 12, op = fork, loc = skgpspawn3
skgpspawn failed:category = 27142, depinfo = 12, op = fork, loc = skgpspawn3
Tue Mar 23 13:55:38 2004
Starting ORACLE instance (normal)
Shutting down instance: further logons disabled
Tue Mar 23 13:56:20 2004
Shutting down instance (abort)
License high water mark = 26
Instance terminated by USER, pid = 25112
Tue Mar 23 13:56:37 2004
Starting ORACLE instance (normal)
LICENSE_MAX_SESSION = 0
LICENSE_SESSIONS_WARNING = 0
SCN scheme 3
Using log_archive_dest parameter default value
LICENSE_MAX_USERS = 0
SYS auditing is disabled
Starting up ORACLE RDBMS Version: 9.2.0.3.0.
System parameters with non-default values:
processes = 400
timed_statistics = TRUE
shared_pool_size = 117440512
large_pool_size = 83886080
java_pool_size = 33554432
control_files = /export/home1/oracle/oradata/hswapdb/control01.ctl,

/export/home1/oracle/oradata/hswapdb/control02.ctl,
/export/home1/oracle/oradata/hswapdb/control03.ctl
db_block_size = 8192
db_cache_size = 352321536
compatible = 9.2.0.0.0
db_file_multiblock_read_count= 16
fast_start_mttr_target = 300
undo_management = AUTO
undo_tablespace = UNDOTBS1
undo_retention = 10800
remote_login_passwordfile= EXCLUSIVE
db_domain = eygle.com
instance_name = hswapdb
dispatchers = (PROTOCOL=TCP) (SERVICE=hswapdbXDB)
remote_dependencies_mode = SIGNATURE
job_queue_processes = 10
hash_join_enabled = TRUE
background_dump_dest = /export/home1/oracle/admin/hswapdb/bdump
user_dump_dest = /export/home1/oracle/admin/hswapdb/udump
core_dump_dest = /export/home1/oracle/admin/hswapdb/cdump
sort_area_size = 524288
db_name = hswapdb
open_cursors = 300
star_transformation_enabled= FALSE
parallel_automatic_tuning= TRUE
query_rewrite_enabled = FALSE
pga_aggregate_target = 154140672
aq_tm_processes = 1
PMON started with pid=2
DBW0 started with pid=3
LGWR started with pid=4
CKPT started with pid=5
SMON started with pid=6
RECO started with pid=7
CJQ0 started with pid=8
QMN0 started with pid=9
Tue Mar 23 13:56:42 2004
starting up 1 shared server(s) ...
Tue Mar 23 13:56:42 2004
starting up 1 dispatcher(s) for network address '(ADDRESS=(PARTIAL=YES)(PROTOCOL=TCP))'...
Tue Mar 23 13:56:43 2004
ALTER DATABASE MOUNT
Tue Mar 23 13:56:47 2004
Successful mount of redo thread 1, with mount id 3253076635.
Tue Mar 23 13:56:47 2004
Database mounted in Exclusive Mode.
Completed: ALTER DATABASE MOUNT
Tue Mar 23 13:56:47 2004
Current log# 2 seq# 2136 mem# 0: /export/home1/oracle/oradata/hswapdb/redo02.log
Successful open of redo thread 1.
Tue Mar 23 12:24:54 2004
SMON: enabling cache recovery
Tue Mar 23 12:24:56 2004
Undo Segment 1 Onlined
Undo Segment 2 Onlined
Undo Segment 3 Onlined
Undo Segment 4 Onlined
Undo Segment 5 Onlined
Undo Segment 6 Onlined
Undo Segment 7 Onlined
Undo Segment 8 Onlined
Undo Segment 9 Onlined
Undo Segment 10 Onlined
Successfully onlined Undo Tablespace 1.
Tue Mar 23 12:24:56 2004
SMON: enabling tx recovery
Tue Mar 23 12:24:56 2004
Database Characterset is ZHS16GBK
Tue Mar 23 12:25:01 2004
SMON: Parallel transaction recovery tried
Tue Mar 23 12:25:01 2004
replication_dependency_tracking turned off (no async multimaster replication found)
Completed: ALTER DATABASE OPEN
Tue Mar 23 12:28:26 2004
/* OracleOEM / ALTER DATABASE DATAFILE '/export/home1/oracle/oradata/hswapdb/users01.dbf' RESIZE 2501760K
Tue Mar 23 12:28:26 2004
ORA-3297 signalled during: /
OracleOEM / ALTER DATABASE DATAFILE '/export/h...
Tue Mar 23 12:28:32 2004
/
OracleOEM / ALTER DATABASE DATAFILE '/export/home1/oracle/oradata/hswapdb/users01.dbf' RESIZE 2501760K
ORA-3297 signalled during: /
OracleOEM / ALTER DATABASE DATAFILE '/export/h...
Tue Mar 23 12:28:53 2004
/
OracleOEM / ALTER DATABASE DATAFILE '/export/home1/oracle/oradata/hswapdb/users01.dbf' RESIZE 3501760K
Tue Mar 23 12:28:53 2004
ORA-3297 signalled during: /
OracleOEM */ ALTER DATABASE DATAFILE '/export/h...
Tue Mar 23 13:40:45 2004
skgpspawn failed:category = 27142, depinfo = 12, op = fork, loc = skgpspawn3
skgpspawn failed:category = 27142, depinfo = 12, op = fork, loc = skgpspawn3
skgpspawn failed:category = 27142, depinfo = 12, op = fork, loc = skgpspawn3
skgpspawn failed:category = 27142, depinfo = 12, op = fork, loc = skgpspawn3
skgpspawn failed:category = 27142, depinfo = 12, op = fork, loc = skgpspawn3
skgpspawn failed:category = 27142, depinfo = 12, op = fork, loc = skgpspawn3
skgpspawn failed:category = 27142, depinfo = 11, op = fork, loc = skgpspawn5
skgpspawn failed:category = 27142, depinfo = 12, op = fork, loc = skgpspawn3
skgpspawn failed:category = 27142, depinfo = 12, op = fork, loc = skgpspawn3
Tue Mar 23 13:42:02 2004
skgpspawn failed:category = 27142, depinfo = 12, op = fork, loc = skgpspawn3
:q

发现数据库多次重起,并记录了部分错误信息

该提示说明数据库无法spawn a new session.

quote Yong Huang's comment:

The number in "skgpspawn failed:category = 27142" is probably ORA error:

$ oerr ora 27142
27142, 0000, "could not create new process"
// *Cause: OS system call
// *Action: check errno and if possible increase the number of processes

OSD (OS-dependent) errors are almost always shown as an skg... error (probably means "system, kernel generic").

I don't know what "depinfo = 12" means.

4.尝试连接数据库

收到错误信息,无法连接数据库

$ sqlplus "/ as sysdba"

SQL*Plus: Release 9.2.0.3.0 - Production on 星期二 3月 23 14:14:06 2004

Copyright (c) 1982, 2002, Oracle Corporation. All rights reserved.

ERROR:
ORA-12540: TNS: 超出内部限制

请输入用户名:
ERROR:
ORA-12540: TNS: 超出内部限制

请输入用户名:
ERROR:
ORA-12540: TNS: 超出内部限制

SP2-0157: 在3次尝试之后无法 CONNECT 到 ORACLE, 退出 SQL*Plus

内部限制超过,通常说明某些系统资源不足.

5.检查监听器

发现部分连接被拒绝

/export/home1/oracle>lsnrctl services

LSNRCTL for Solaris: Version 9.2.0.3.0 - Production on 23-3月 -2004 14:37:23

Copyright (c) 1991, 2002, Oracle Corporation. All rights reserved.

正在连接到 (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=EXTPROC)))
服务摘要..
服务 "PLSExtProc" 包含 1 个例程。
例程 "PLSExtProc", 状态 UNKNOWN, 包含此服务的 1 个处理程序...
处理程序:
"DEDICATED" 已建立:0 已被拒绝:0
LOCAL SERVER
服务 "hswapdb.eygle.com" 包含 2 个例程。
例程 "hswapdb", 状态 UNKNOWN, 包含此服务的 1 个处理程序...
处理程序:
"DEDICATED" 已建立:6 已被拒绝:0
LOCAL SERVER
例程 "hswapdb", 状态 READY, 包含此服务的 1 个处理程序...
处理程序:
"DEDICATED" 已建立:21 已拒绝:6 状态:ready
LOCAL SERVER
服务 "hswapdbXDB.eygle.com" 包含 1 个例程。
例程 "hswapdb", 状态 READY, 包含此服务的 1 个处理程序...
处理程序:
"D000" 已建立:0 已被拒绝:0 当前: 0 最大: 972 状态: ready
DISPATCHER

  1<machine: 25839="" pid:="" wapplatform,="">   
  2(ADDRESS=(PROTOCOL=tcp)(HOST=wapplatform)(PORT=32869))   
  3命令执行成功   
  4  
  5  
  6  
  7在listener.log中找到了相关错误信息   
  8  
  9  
 10  
 1123-342 -2004 12:19:40 * (CONNECT_DATA=(SID=hswapdb)(CID=(PROGRAM=C:\WINNT\Microsoft.NET\Framework\v1.1.4322\aspnet_wp.e   
 12xe)(HOST=SWAN)(USER=SYSTEM))) * (ADDRESS=(PROTOCOL=tcp)(HOST=172.16.30.125)(PORT=1291)) * establish * hswapdb * 12500   
 13TNS-12500: TNS324045676444646770335476616   
 1475034   
 15TNS-12540: TNS323436422743767666   
 16TNS-12560: TNS: 05212254674563   
 17TNS-00510: 3436422743767666   
 18Solaris Error: 12: Not enough space   
 1923-342 -2004 12:19:50 * (CONNECT_DATA=(SID=hswapdb)(CID=(PROGRAM=C:\Program Files\PLSQL Developer\PLSQLDev.exe)(HOST=SW   
 20AN)(USER=Administrator))) * (ADDRESS=(PROTOCOL=tcp)(HOST=172.16.30.125)(PORT=1292)) * establish * hswapdb * 12500   
 21TNS-12500: TNS324045676444646770335476616   
 2275034   
 23TNS-12540: TNS323436422743767666   
 24TNS-12560: TNS: 05212254674563   
 25TNS-00510: 3436422743767666   
 26Solaris Error: 12: Not enough space   
 27  
 28/export/home1/oracle/app/network/log&gt;grep -w 12 /usr/include/sys/errno.h   
 29#define ENOMEM 12 /* Not enough core   
 30  
 31  
 32  
 33quote Yong Huang's comment:   
 34  
 35$ grep -w 12 /usr/include/sys/errno.h   
 36#define ENOMEM 12 /* Not enough core */   
 37  
 38Here "core" means memory, including real RAM memory and swap space.   
 39  
 40  
 416.退出Oracle用户检查   
 42  
 43检查系统日志信息,发现大量失败的su操作   
 44有swap区不足的报告   
 45  
 46  
 47  
 48/export/home1/oracle/admin/hswapdb/bdump&gt;exit   
 49wapplatform:/&gt;dmesg   
 50  
 512004年03月23日 星期二 14时00分32秒 CST   
 52Mar 22 22:52:36 wapplatform elfexec: [ID 700856 kern.notice] ps: Cannot find ^?ELF^A^B^A   
 53Mar 22 22:53:00 wapplatform ufs: [ID 845546 kern.notice] NOTICE: alloc: /export/home1: file system full   
 54Mar 22 22:53:09 wapplatform elfexec: [ID 700856 kern.notice] w: Cannot find ^?ELF^A^B^A   
 55Mar 22 22:53:53 wapplatform last message repeated 4 times   
 56Mar 22 22:56:28 wapplatform elfexec: [ID 700856 kern.notice] ipnat: Cannot find ^?ELF^B^B^A   
 57Mar 22 22:58:00 wapplatform ufs: [ID 845546 kern.notice] NOTICE: alloc: /export/home1: file system full   
 58Mar 22 22:59:54 wapplatform elfexec: [ID 700856 kern.notice] ipnat: Cannot find ^?ELF^B^B^A   
 59Mar 22 23:02:26 wapplatform ufs: [ID 845546 kern.notice] NOTICE: alloc: /export/home1: file system full   
 60Mar 22 23:03:00 wapplatform last message repeated 1 time   
 61Mar 22 23:08:00 wapplatform ufs: [ID 845546 kern.notice] NOTICE: alloc: /export/home1: file system full   
 62Mar 22 23:08:34 wapplatform elfexec: [ID 700856 kern.notice] ipnat: Cannot find ^?ELF^B^B^A   
 63Mar 22 23:10:27 wapplatform last message repeated 3 times   
 64Mar 22 23:11:49 wapplatform elfexec: [ID 700856 kern.notice] ipnat: Cannot find ^?ELF^B^B^A   
 65Mar 22 23:11:52 wapplatform last message repeated 1 time   
 66Mar 22 23:13:01 wapplatform ufs: [ID 845546 kern.notice] NOTICE: alloc: /export/home1: file system full   
 67Mar 22 23:18:01 wapplatform last message repeated 1 time   
 68Mar 22 23:23:01 wapplatform ufs: [ID 845546 kern.notice] NOTICE: alloc: /export/home1: file system full   
 69Mar 22 23:28:01 wapplatform last message repeated 1 time   
 70Mar 22 23:33:01 wapplatform ufs: [ID 845546 kern.notice] NOTICE: alloc: /export/home1: file system full   
 71Mar 22 23:38:01 wapplatform last message repeated 1 time   
 72Mar 22 23:43:01 wapplatform ufs: [ID 845546 kern.notice] NOTICE: alloc: /export/home1: file system full   
 73Mar 22 23:48:01 wapplatform last message repeated 1 time   
 74Mar 22 23:53:01 wapplatform ufs: [ID 845546 kern.notice] NOTICE: alloc: /export/home1: file system full   
 75Mar 22 23:58:01 wapplatform last message repeated 1 time   
 76Mar 23 00:00:00 wapplatform ufs: [ID 213553 kern.notice] NOTICE: realloccg /export/home1: file system full   
 77Mar 23 00:00:00 wapplatform sendmail[3075]: [ID 702911 mail.crit] My unqualified host name (wapplatform) unknown; sleeping   
 78  
 79for retry   
 80Mar 23 00:01:00 wapplatform sendmail[3075]: [ID 702911 mail.alert] unable to qualify my own domain name (wapplatform) --   
 81  
 82using short name   
 83Mar 23 00:02:36 wapplatform ufs: [ID 845546 kern.notice] NOTICE: alloc: /export/home1: file system full   
 84Mar 23 00:03:02 wapplatform last message repeated 1 time   
 85Mar 23 00:08:02 wapplatform ufs: [ID 845546 kern.notice] NOTICE: alloc: /export/home1: file system full   
 86....   
 87  
 88Mar 23 10:18:15 wapplatform ufs: [ID 845546 kern.notice] NOTICE: alloc: /export/home1: file system full   
 89Mar 23 10:20:41 wapplatform ufs: [ID 213553 kern.notice] NOTICE: realloccg /export/home1: file system full   
 90Mar 23 10:20:47 wapplatform last message repeated 1 time   
 91Mar 23 10:23:15 wapplatform ufs: [ID 845546 kern.notice] NOTICE: alloc: /export/home1: file system full   
 92Mar 23 10:24:38 wapplatform ufs: [ID 213553 kern.notice] NOTICE: realloccg /export/home1: file system full   
 93Mar 23 10:24:43 wapplatform last message repeated 1 time   
 94Mar 23 10:24:55 wapplatform ufs: [ID 213553 kern.notice] NOTICE: realloccg /export/home1: file system full   
 95Mar 23 10:25:06 wapplatform last message repeated 2 times   
 96Mar 23 11:09:31 wapplatform genunix: [ID 470503 kern.warning] WARNING: Sorry, no swap space to grow stack for pid 3118 (su)   
 97Mar 23 11:09:39 wapplatform genunix: [ID 470503 kern.warning] WARNING: Sorry, no swap space to grow stack for pid 3121 (su)   
 98Mar 23 11:10:48 wapplatform genunix: [ID 470503 kern.warning] WARNING: Sorry, no swap space to grow stack for pid 3137 (su)   
 99Mar 23 11:18:02 wapplatform sshd[3620]: [ID 800047 auth.error] error: grantpt: Not enough space   
100Mar 23 11:18:02 wapplatform sshd[3620]: [ID 800047 auth.error] error: session_pty_req: session 0 alloc failed   
101Mar 23 11:18:43 wapplatform genunix: [ID 470503 kern.warning] WARNING: Sorry, no swap space to grow stack for pid 3636 (su)   
102Mar 23 11:19:47 wapplatform genunix: [ID 470503 kern.warning] WARNING: Sorry, no swap space to grow stack for pid 3672 (su)   
103Mar 23 11:20:20 wapplatform genunix: [ID 470503 kern.warning] WARNING: Sorry, no swap space to grow stack for pid 3694 (su)   
104Mar 23 11:22:23 wapplatform genunix: [ID 470503 kern.warning] WARNING: Sorry, no swap space to grow stack for pid 3736 (sshd)   
105Mar 23 11:23:17 wapplatform tmpfs: [ID 518458 kern.warning] WARNING: /tmp: File system full, swap space limit exceeded   
106Mar 23 11:23:40 wapplatform genunix: [ID 470503 kern.warning] WARNING: Sorry, no swap space to grow stack for pid 3804 (su)   
107Mar 23 11:23:40 wapplatform last message repeated 8 times   
108Mar 23 11:23:56 wapplatform genunix: [ID 470503 kern.warning] WARNING: Sorry, no swap space to grow stack for pid 3806 (ps)   
109Mar 23 11:23:56 wapplatform last message repeated 12 times   
110Mar 23 11:24:01 wapplatform genunix: [ID 470503 kern.warning] WARNING: Sorry, no swap space to grow stack for pid 3808 (w)   
111Mar 23 11:24:01 wapplatform last message repeated 8 times   
112Mar 23 13:40:56 wapplatform su: [ID 810491 auth.crit] 'su root' failed for root on /dev/pts/2   
113Mar 23 13:46:26 wapplatform genunix: [ID 470503 kern.warning] WARNING: Sorry, no swap space to grow stack for pid 24888   
114  
115(sqlplus)   
116Mar 23 13:49:18 wapplatform su: [ID 810491 auth.crit] 'su oracle' failed for root on /dev/pts/6   
117Mar 23 13:54:03 wapplatform genunix: [ID 470503 kern.warning] WARNING: Sorry, no swap space to grow stack for pid 25035 (su)   
118Mar 23 13:54:08 wapplatform genunix: [ID 470503 kern.warning] WARNING: Sorry, no swap space to grow stack for pid 25036 (su)   
119  
120  
121  
122现在基本可以判断是交换区的问题,当然和Oracle SGA设置有关.   
123  
1247.检查系统内存及交换区使用   
125  
126  
127  
128  
129/export/home1/oracle/admin/hswapdb/bdump&gt;exit   
130wapplatform:/&gt;dmesg   
131  
1322004年03月23日 星期二 14时00分32秒 CST   
133Mar 22 22:52:36 wapplatform elfexec: [ID 700856 kern.notice] ps: Cannot find ^?ELF^A^B^A   
134Mar 22 22:53:00 wapplatform ufs: [ID 845546 kern.notice] NOTICE: alloc: /export/home1: file system full   
135Mar 22 22:53:09 wapplatform elfexec: [ID 700856 kern.notice] w: Cannot find ^?ELF^A^B^A   
136Mar 22 22:53:53 wapplatform last message repeated 4 times   
137Mar 22 22:56:28 wapplatform elfexec: [ID 700856 kern.notice] ipnat: Cannot find ^?ELF^B^B^A   
138Mar 22 22:58:00 wapplatform ufs: [ID 845546 kern.notice] NOTICE: alloc: /export/home1: file system full   
139Mar 22 22:59:54 wapplatform elfexec: [ID 700856 kern.notice] ipnat: Cannot find ^?ELF^B^B^A   
140Mar 22 23:02:26 wapplatform ufs: [ID 845546 kern.notice] NOTICE: alloc: /export/home1: file system full   
141Mar 22 23:03:00 wapplatform last message repeated 1 time   
142Mar 22 23:08:00 wapplatform ufs: [ID 845546 kern.notice] NOTICE: alloc: /export/home1: file system full   
143Mar 22 23:08:34 wapplatform elfexec: [ID 700856 kern.notice] ipnat: Cannot find ^?ELF^B^B^A   
144Mar 22 23:10:27 wapplatform last message repeated 3 times   
145Mar 22 23:11:49 wapplatform elfexec: [ID 700856 kern.notice] ipnat: Cannot find ^?ELF^B^B^A   
146Mar 22 23:11:52 wapplatform last message repeated 1 time   
147Mar 22 23:13:01 wapplatform ufs: [ID 845546 kern.notice] NOTICE: alloc: /export/home1: file system full   
148Mar 22 23:18:01 wapplatform last message repeated 1 time   
149Mar 22 23:23:01 wapplatform ufs: [ID 845546 kern.notice] NOTICE: alloc: /export/home1: file system full   
150Mar 22 23:28:01 wapplatform last message repeated 1 time   
151Mar 22 23:33:01 wapplatform ufs: [ID 845546 kern.notice] NOTICE: alloc: /export/home1: file system full   
152Mar 22 23:38:01 wapplatform last message repeated 1 time   
153Mar 22 23:43:01 wapplatform ufs: [ID 845546 kern.notice] NOTICE: alloc: /export/home1: file system full   
154Mar 22 23:48:01 wapplatform last message repeated 1 time   
155Mar 22 23:53:01 wapplatform ufs: [ID 845546 kern.notice] NOTICE: alloc: /export/home1: file system full   
156Mar 22 23:58:01 wapplatform last message repeated 1 time   
157Mar 23 00:00:00 wapplatform ufs: [ID 213553 kern.notice] NOTICE: realloccg /export/home1: file system full   
158Mar 23 00:00:00 wapplatform sendmail[3075]: [ID 702911 mail.crit] My unqualified host name (wapplatform) unknown; sleeping   
159  
160for retry   
161Mar 23 00:01:00 wapplatform sendmail[3075]: [ID 702911 mail.alert] unable to qualify my own domain name (wapplatform) --   
162  
163using short name   
164Mar 23 00:02:36 wapplatform ufs: [ID 845546 kern.notice] NOTICE: alloc: /export/home1: file system full   
165Mar 23 00:03:02 wapplatform last message repeated 1 time   
166Mar 23 00:08:02 wapplatform ufs: [ID 845546 kern.notice] NOTICE: alloc: /export/home1: file system full   
167....   
168  
169Mar 23 10:18:15 wapplatform ufs: [ID 845546 kern.notice] NOTICE: alloc: /export/home1: file system full   
170Mar 23 10:20:41 wapplatform ufs: [ID 213553 kern.notice] NOTICE: realloccg /export/home1: file system full   
171Mar 23 10:20:47 wapplatform last message repeated 1 time   
172Mar 23 10:23:15 wapplatform ufs: [ID 845546 kern.notice] NOTICE: alloc: /export/home1: file system full   
173Mar 23 10:24:38 wapplatform ufs: [ID 213553 kern.notice] NOTICE: realloccg /export/home1: file system full   
174Mar 23 10:24:43 wapplatform last message repeated 1 time   
175Mar 23 10:24:55 wapplatform ufs: [ID 213553 kern.notice] NOTICE: realloccg /export/home1: file system full   
176Mar 23 10:25:06 wapplatform last message repeated 2 times   
177Mar 23 11:09:31 wapplatform genunix: [ID 470503 kern.warning] WARNING: Sorry, no swap space to grow stack for pid 3118 (su)   
178Mar 23 11:09:39 wapplatform genunix: [ID 470503 kern.warning] WARNING: Sorry, no swap space to grow stack for pid 3121 (su)   
179Mar 23 11:10:48 wapplatform genunix: [ID 470503 kern.warning] WARNING: Sorry, no swap space to grow stack for pid 3137 (su)   
180Mar 23 11:18:02 wapplatform sshd[3620]: [ID 800047 auth.error] error: grantpt: Not enough space   
181Mar 23 11:18:02 wapplatform sshd[3620]: [ID 800047 auth.error] error: session_pty_req: session 0 alloc failed   
182Mar 23 11:18:43 wapplatform genunix: [ID 470503 kern.warning] WARNING: Sorry, no swap space to grow stack for pid 3636 (su)   
183Mar 23 11:19:47 wapplatform genunix: [ID 470503 kern.warning] WARNING: Sorry, no swap space to grow stack for pid 3672 (su)   
184Mar 23 11:20:20 wapplatform genunix: [ID 470503 kern.warning] WARNING: Sorry, no swap space to grow stack for pid 3694 (su)   
185Mar 23 11:22:23 wapplatform genunix: [ID 470503 kern.warning] WARNING: Sorry, no swap space to grow stack for pid 3736 (sshd)   
186Mar 23 11:23:17 wapplatform tmpfs: [ID 518458 kern.warning] WARNING: /tmp: File system full, swap space limit exceeded   
187Mar 23 11:23:40 wapplatform genunix: [ID 470503 kern.warning] WARNING: Sorry, no swap space to grow stack for pid 3804 (su)   
188Mar 23 11:23:40 wapplatform last message repeated 8 times   
189Mar 23 11:23:56 wapplatform genunix: [ID 470503 kern.warning] WARNING: Sorry, no swap space to grow stack for pid 3806 (ps)   
190Mar 23 11:23:56 wapplatform last message repeated 12 times   
191Mar 23 11:24:01 wapplatform genunix: [ID 470503 kern.warning] WARNING: Sorry, no swap space to grow stack for pid 3808 (w)   
192Mar 23 11:24:01 wapplatform last message repeated 8 times   
193Mar 23 13:40:56 wapplatform su: [ID 810491 auth.crit] 'su root' failed for root on /dev/pts/2   
194Mar 23 13:46:26 wapplatform genunix: [ID 470503 kern.warning] WARNING: Sorry, no swap space to grow stack for pid 24888   
195  
196(sqlplus)   
197Mar 23 13:49:18 wapplatform su: [ID 810491 auth.crit] 'su oracle' failed for root on /dev/pts/6   
198Mar 23 13:54:03 wapplatform genunix: [ID 470503 kern.warning] WARNING: Sorry, no swap space to grow stack for pid 25035 (su)   
199Mar 23 13:54:08 wapplatform genunix: [ID 470503 kern.warning] WARNING: Sorry, no swap space to grow stack for pid 25036 (su)   
200  
201  
202现在基本可以判断是交换区的问题,当然和Oracle SGA设置有关.   
203  
2047.检查系统内存及交换区使用   
205  
206  
207  
208  
209  
210$ toplast pid: 25456; load averages: 0.67, 0.70, 0.69 14:10:0393 processes: 91 sleeping, 2 on cpuCPU states: 72.7% idle, 14.9% user, 2.7% kernel, 9.7% iowait, 0.0% swapMemory: 1024M real, 34M free, 752M swap in use, 10M swap freePID USERNAME THR PRI NICE SIZE RES STATE TIME CPU COMMAND25199 oracle 1 40 0 674M 631M cpu/2 8:03 16.32% oracle25209 oracle 1 30 0 675M 630M sleep 0:03 0.13% oracle25159 oracle 1 48 0 674M 628M sleep 0:03 0.06% oracle25384 oracle 1 58 0 2632K 1736K cpu/0 0:01 0.05% top25145 oracle 143 58 0 682M 630M sleep 0:01 0.03% oracle25446 oracle 1 58 0 674M 628M sleep 0:00 0.03% oracle25149 oracle 15 58 0 682M 626M sleep 0:00 0.02% oracle25075 oracle 1 48 0 17M 7208K sleep 0:00 0.01% tnslsnr25151 oracle 11 58 0 676M 624M sleep 0:00 0.01% oracle25366 oracle 1 10 0 674M 628M sleep 0:00 0.00% oracle25356 oracle 1 18 0 674M 628M sleep 0:00 0.00% oracle25360 oracle 1 20 0 674M 628M sleep 0:00 0.00% oracle25364 oracle 1 20 0 674M 628M sleep 0:00 0.00% oracle25362 oracle 1 20 0 674M 628M sleep 0:00 0.00% oracle25330 oracle 1 28 0 674M 628M sleep 0:00 0.00% oracle   
211  
212发现物理内存仅为1G,free部分为34M,交换区使用了752M,仅10M free   
213系统内存严重不足,Swap区不足   
214  
215  
216  
2178\. 检查数据库的SGA设置   
218  
219发现SGA设置为: 622299344 bytes   
220接近600M   
221  
222  
223  
224  
225  
226wapplatform:/&gt;su - oracle   
227Sun Microsystems Inc. SunOS 5.8 Generic Patch October 2001   
228You have new mail.   
229/export/home1/oracle&gt;sqlplus "/ as sysdba"   
230  
231SQL*Plus: Release 9.2.0.3.0 - Production on 星期二 3月 23 14:02:30 2004   
232  
233Copyright (c) 1982, 2002, Oracle Corporation. All rights reserved.   
234  
235  
236连接到:   
237Oracle9i Enterprise Edition Release 9.2.0.3.0 - 64bit Production   
238With the Partitioning, OLAP and Oracle Data Mining options   
239JServer Release 9.2.0.3.0 - Production   
240  
241SQL&gt; show sga   
242  
243Total System Global Area 622299344 bytes   
244Fixed Size 731344 bytes   
245Variable Size 268435456 bytes   
246Database Buffers 352321536 bytes   
247Redo Buffers 811008 bytes   
248SQL&gt;   
249  
250  
251  
252对于RAM小于1G的系统,Dedicated模式下,Oracle的SGA一般不应超过1/2物理内存.   
253  
254  
255  
2569.第一步调整   
257减小SGA,为系统保留足够的内存.   
258  
25910.增加swap区   
260  
261  
262  
263  
264  
265  
266wapplatform:/&gt;df -k   
267文件系统 千字节 用了 可用 容量 挂接在   
268/dev/dsk/c0t1d0s0 3099093 105421 2931691 4% /   
269/dev/dsk/c0t2d0s0 10325760 8359637 1862866 82% /usr   
270/proc 0 0 0 0% /proc   
271fd 0 0 0 0% /dev/fd   
272mnttab 0 0 0 0% /etc/mnttab   
273/dev/dsk/c0t1d0s3 1018382 285914 671366 30% /var   
274swap 3904 24 3880 1% /var/run   
275swap 3936 56 3880 2% /tmp   
276/dev/dsk/c0t1d0s5 1671823 459202 1162467 29% /opt   
277/dev/dsk/c0t2d0s7 7087473 6068462 948137 87% /export/home   
278/dev/dsk/c2t1d0s7 17413250 15900222 1338896 93% /export/home2   
279/dev/dsk/c0t3d0s7 17413250 13749782 3489336 80% /export/home1   
280/dev/dsk/c0t1d0s1 771110 382410 334723 54% /usr/openwin   
281/export/home/wapgw/luke   
2827087473 6068462 948137 87% /home/wap   
283  
284wapplatform:/var/swap&gt;cd /export/home1   
285wapplatform:/export/home1&gt;ls   
286TT_DB lost+found oracle oracli9   
287wapplatform:/export/home1&gt;mkdir swap   
288wapplatform:/export/home1&gt;cd swap   
289wapplatform:/export/home1/swap&gt;mkfile -v 1g swapfile1   
290swapfile1 1073741824 bytes   
291wapplatform:/export/home1/swap&gt;id   
292uid=0(root) gid=1(other)   
293wapplatform:/export/home1/swap&gt;swap -a /export/home1/swap/swapfile1   
294wapplatform:/export/home1/swap&gt;swap -s   
295总数:分配了 623160k 字节 + 保留 162704k = 已使用 785864k,1010936k 可用   
296  
297  
298  
299  
30011.连接测试   
301  
302系统恢复正常,问题解决   
303  
304  
305  
306  
307  
308wapplatform:/export/home1/swap&gt;su - oracle   
309Sun Microsystems Inc. SunOS 5.8 Generic Patch October 2001   
310You have new mail.   
311/export/home1/oracle&gt;sqlplus "/ as sysdba"   
312  
313SQL*Plus: Release 9.2.0.3.0 - Production on 星期四 3月 25 11:56:28 2004   
314  
315Copyright (c) 1982, 2002, Oracle Corporation. All rights reserved.   
316  
317  
318连接到:   
319Oracle9i Enterprise Edition Release 9.2.0.3.0 - 64bit Production   
320With the Partitioning, OLAP and Oracle Data Mining options   
321JServer Release 9.2.0.3.0 - Production   
322  
323SQL&gt; exit   
324从Oracle9i Enterprise Edition Release 9.2.0.3.0 - 64bit Production   
325With the Partitioning, OLAP and Oracle Data Mining options   
326JServer Release 9.2.0.3.0 - Production中断开   
327/export/home1/oracle&gt;top   
328  
329last pid: 5372; load averages: 0.25, 0.22, 0.29   
330  
33111:57:58   
332148 processes: 137 sleeping, 9 zombie, 2 on cpu   
333CPU states: 98.8% idle, 0.2% user, 0.7% kernel, 0.2% iowait, 0.0% swap   
334Memory: 1024M real, 17M free, 824M swap in use, 934M swap free   
335  
336PID USERNAME THR PRI NICE SIZE RES STATE TIME CPU COMMAND   
3375363 root 1 58 0 2680K 1736K sleep 0:00 0.24% top   
3385370 oracle 1 58 0 514M 469M sleep 0:00 0.18% oracle   
3395366 oracle 1 28 0 514M 469M sleep 0:00 0.11% oracle   
3405341 oracle 1 58 0 2680K 1736K cpu/2 0:00 0.10% top   
3415372 oracle 1 48 0 61M 3288K cpu/3 0:00 0.06% oracle   
3421288 oracle 1 48 0 514M 468M sleep 5:33 0.05% oracle   
343607 root 12 48 0 2768K 2312K sleep 1:48 0.03% mibiisa   
34425075 oracle 1 48 0 17M 7208K sleep 0:16 0.02% tnslsnr   
3451278 oracle 15 58 0 522M 466M sleep 0:49 0.02% oracle   
346374 root 11 53 0 3504K 2888K sleep 0:16 0.01% nscd   
3471280 oracle 19 58 0 518M 466M sleep 0:28 0.00% oracle   
3485361 root 1 46 0 1024K 680K sleep 0:00 0.00% sleep   
3495362 root 1 46 0 1024K 680K sleep 0:00 0.00% sleep   
3505469 root 1 36 0 1952K 1176K sleep 30:09 0.00% monithttp   
3514167 oracle 1 40 0 515M 471M sleep 29:38 0.00% oracle   
352  
353  
354问题总结:   
355  
356Oracle数据库问题的解决从来就离不开操作系统   
357  
358很多时候我们必须通过操作系统一级的手段来诊断并解决问题.   
359  
360关于操作系统   
361  
362一般Swap区的推荐值为2XRAM   
363如果Ram很大,不一定非要把Swap设置为2xSwap   
364但是通常至少设置Swap = Ram   
365  
366如果Swap区过小,在系统繁忙期间   
367产生大量交换无法换到磁盘,就会出现问题.   
368如本案例就是这样。   
369  
370另外,如果系统Ram较小   
371通常设置SGA &lt; 1/2 Ram   
372  
373要为Server process及OS保留足够的内存空间.</machine:>
Published At
Categories with 数据库类
comments powered by Disqus