link:
http://www.eygle.com/case/sga1.htm
案例描述:
用户报告,服务器启动一段时间以后,无法建立数据库连接
重新启动几分钟以后,再次无法连接
系统无法正常使用.
1.登陆系统
SunOS 5.8
login: root
Password:
Last login: Tue Mar 23 13:56:59 from 172.16.31.41
Sun Microsystems Inc. SunOS 5.8 Generic Patch October 2001
You have new mail.
2.su 为Oracle用户
检查启动的Oracle进程
发现后台进程正常,有一定量的用户连接
>
> wapplatform:/>su - oracle
> > Sun Microsystems Inc. SunOS 5.8 Generic Patch October 2001
> > You have new mail.
> > /export/home1/oracle>ls
> > admin codesyndealt31 exp.sh local.cshrc local.profile oraclebak oui v6_database
> > app exp.log jre local.login nsmail oradata swan
> > export/home1/oracle>cd admin
> > /export/home1/oracle/admin>ps -ef|grep ora
> > oracle 25269 25258 0 13:58:36 pts/3 0:00 grep ora
> > oracle 25257 24906 0 13:58:31 pts/4 0:00 vi alert_HSWAPDB.log
> > oracle 25267 1 1 13:58:34 ? 0:00 oracleHSWAPDB (LOCAL=NO)
> > oracle 25184 1 0 13:56:57 ? 0:00 ora_p007_HSWAPDB
> > oracle 25182 1 0 13:56:57 ? 0:00 ora_p006_HSWAPDB
> > oracle 25193 1 0 13:57:03 ? 0:01 oracleHSWAPDB (LOCAL=NO)
> > oracle 25209 1 0 13:57:09 ? 0:00 oracleHSWAPDB (LOCAL=NO)
> > oracle 25176 1 0 13:56:57 ? 0:00 ora_p003_HSWAPDB
> > oracle 25180 1 0 13:56:57 ? 0:00 ora_p005_HSWAPDB
> > oracle 25172 1 0 13:56:56 ? 0:00 ora_p001_HSWAPDB
> > oracle 25178 1 0 13:56:57 ? 0:00 ora_p004_HSWAPDB
> > oracle 25170 1 0 13:56:56 ? 0:00 ora_p000_HSWAPDB
> > oracle 24254 24240 0 12:08:25 pts/2 0:00 -ksh
> > oracle 25174 1 0 13:56:56 ? 0:00 ora_p002_HSWAPDB
> > oracle 25244 1 1 13:58:23 ? 0:00 oracleHSWAPDB (LOCAL=NO)
> > oracle 25218 1 0 13:57:23 ? 0:00 oracleHSWAPDB (LOCAL=NO)
> > oracle 25159 1 0 13:56:42 ? 0:02 ora_qmn0_HSWAPDB
> > oracle 25230 1 0 13:57:40 ? 0:01 oracleHSWAPDB (LOCAL=NO)
> > oracle 25161 1 0 13:56:42 ? 0:00 ora_s000_HSWAPDB
> > oracle 25149 1 0 13:56:41 ? 0:01 ora_lgwr_HSWAPDB
> > oracle 25157 1 0 13:56:42 ? 0:00 ora_cjq0_HSWAPDB
> > oracle 24906 3698 0 13:47:47 pts/4 0:00 -ksh
> > oracle 25153 1 0 13:56:42 ? 0:01 ora_smon_HSWAPDB
> > oracle 25058 7464 0 13:55:14 pts/1 0:00 -ksh
> > oracle 25163 1 0 13:56:42 ? 0:00 ora_d000_HSWAPDB
> > oracle 25155 1 0 13:56:42 ? 0:00 ora_reco_HSWAPDB
> > oracle 25151 1 0 13:56:41 ? 0:00 ora_ckpt_HSWAPDB
> > oracle 25145 1 0 13:56:41 ? 0:00 ora_dbw0_HSWAPDB
> > oracle 25199 1 15 13:57:04 ? 0:49 ora_j000_HSWAPDB
> > oracle 4149 4146 0 12:05:11 pts/5 0:00 -ksh
> > oracle 25232 1 0 13:57:41 ? 0:00 oracleHSWAPDB (LOCAL=NO)
> > oracle 25119 1 0 13:56:29 ? 0:00 oraclehswapdb (LOCAL=NO)
> > oracle 25075 1 0 13:55:34 ? 0:00 /export/home1/oracle/app/bin/tnslsnr LISTENER -inherit
> > oracle 24374 4149 0 12:21:56 pts/5 0:00 sqlplus /nolog
> > oracle 25143 1 0 13:56:41 ? 0:00 ora_pmon_HSWAPDB
> > oracle 25258 25242 0 13:58:31 pts/3 0:00 -ksh
> > /export/home1/oracle/admin>ps -ef|grep ora_
> > oracle 25275 25258 0 13:58:42 pts/3 0:00 grep ora_
> > oracle 25184 1 0 13:56:57 ? 0:00 ora_p007_HSWAPDB
> > oracle 25182 1 0 13:56:57 ? 0:00 ora_p006_HSWAPDB
> > oracle 25176 1 0 13:56:57 ? 0:00 ora_p003_HSWAPDB
> > oracle 25180 1 0 13:56:57 ? 0:00 ora_p005_HSWAPDB
> > oracle 25172 1 0 13:56:56 ? 0:00 ora_p001_HSWAPDB
> > oracle 25178 1 0 13:56:57 ? 0:00 ora_p004_HSWAPDB
> > oracle 25170 1 0 13:56:56 ? 0:00 ora_p000_HSWAPDB
> > oracle 25174 1 0 13:56:56 ? 0:00 ora_p002_HSWAPDB
> > oracle 25159 1 0 13:56:42 ? 0:02 ora_qmn0_HSWAPDB
> > oracle 25161 1 0 13:56:42 ? 0:00 ora_s000_HSWAPDB
> > oracle 25149 1 0 13:56:41 ? 0:01 ora_lgwr_HSWAPDB
> > oracle 25157 1 0 13:56:42 ? 0:00 ora_cjq0_HSWAPDB
> > oracle 25153 1 0 13:56:42 ? 0:01 ora_smon_HSWAPDB
> > oracle 25163 1 0 13:56:42 ? 0:00 ora_d000_HSWAPDB
> > oracle 25155 1 0 13:56:42 ? 0:00 ora_reco_HSWAPDB
> > oracle 25151 1 0 13:56:41 ? 0:00 ora_ckpt_HSWAPDB
> > oracle 25145 1 0 13:56:41 ? 0:00 ora_dbw0_HSWAPDB
> > oracle 25199 1 13 13:57:04 ? 0:51 ora_j000_HSWAPDB
> > oracle 25143 1 0 13:56:41 ? 0:00 ora_pmon_HSWAPDB
3.检查Alert.log警报日志文件
>
> /export/home1/oracle/admin>ls
> > hswapdb
> > /export/home1/oracle/admin>cd *
> > /export/home1/oracle/admin/hswapdb>ls
> > bdump cdump create pfile udump
> > /export/home1/oracle/admin/hswapdb>cd bdump
> > /export/home1/oracle/admin/hswapdb/bdump>
>
> /export/home1/oracle/admin/hswapdb/bdump>ls -l *.log
>
> -rw-r--r-- 1 oracle dba 813396 Mar 23 13:57 alert_HSWAPDB.log
> /export/home1/oracle/admin/hswapdb/bdump>vi .log
> "alert_HSWAPDB.log" 18888 lines, 813396 characters (115 null)
> Tue Jun 24 21:17:14 2003
> Starting ORACLE instance (normal)
> LICENSE_MAX_SESSION = 0
> LICENSE_SESSIONS_WARNING = 0
> SCN scheme 3
> Using log_archive_dest parameter default value
> LICENSE_MAX_USERS = 0
> SYS auditing is disabled
> Starting up ORACLE RDBMS Version: 9.2.0.3.0.
> System parameters with non-default values:
> processes = 400
> timed_statistics = TRUE
> shared_pool_size = 117440512
> large_pool_size = 83886080
> java_pool_size = 33554432
> control_files = /export/home1/oracle/oradata/hswapdb/control01.ctl,
>
> /export/home1/oracle/oradata/hswapdb/control02.ctl,
> /export/home1/oracle/oradata/hswapdb/control03.ctl
> db_block_size = 8192
> db_cache_size = 352321536
> compatible = 9.2.0.0.0
> db_file_multiblock_read_count= 16
> fast_start_mttr_target = 300
> undo_management = AUTO
> undo_tablespace = UNDOTBS1
> undo_retention = 10800
> remote_login_passwordfile= EXCLUSIVE
> db_domain = eygle.com
> instance_name = hswapdb
> dispatchers = (PROTOCOL=TCP) (SERVICE=hswapdbXDB)
> job_queue_processes = 10
> hash_join_enabled = TRUE
> background_dump_dest = /export/home1/oracle/admin/hswapdb/bdump
> user_dump_dest = /export/home1/oracle/admin/hswapdb/udump
> core_dump_dest = /export/home1/oracle/admin/hswapdb/cdump
> sort_area_size = 524288
> db_name = hswapdb
> open_cursors = 300
> star_transformation_enabled= FALSE
> query_rewrite_enabled = FALSE
> pga_aggregate_target = 154140672
> aq_tm_processes = 1
>
> .................
>
> Tue Mar 23 13:40:45 2004
> skgpspawn failed:category = 27142, depinfo = 12, op = fork, loc = skgpspawn3
> skgpspawn failed:category = 27142, depinfo = 12, op = fork, loc = skgpspawn3
> skgpspawn failed:category = 27142, depinfo = 12, op = fork, loc = skgpspawn3
> skgpspawn failed:category = 27142, depinfo = 12, op = fork, loc = skgpspawn3
> skgpspawn failed:category = 27142, depinfo = 12, op = fork, loc = skgpspawn3
> skgpspawn failed:category = 27142, depinfo = 12, op = fork, loc = skgpspawn3
> skgpspawn failed:category = 27142, depinfo = 11, op = fork, loc = skgpspawn5
> skgpspawn failed:category = 27142, depinfo = 12, op = fork, loc = skgpspawn3
> skgpspawn failed:category = 27142, depinfo = 12, op = fork, loc = skgpspawn3
> Tue Mar 23 13:42:02 2004
> skgpspawn failed:category = 27142, depinfo = 12, op = fork, loc = skgpspawn3
> skgpspawn failed:category = 27142, depinfo = 12, op = fork, loc = skgpspawn3
> skgpspawn failed:category = 27142, depinfo = 12, op = fork, loc = skgpspawn3
> skgpspawn failed:category = 27142, depinfo = 12, op = fork, loc = skgpspawn3
> Tue Mar 23 13:55:38 2004
> Starting ORACLE instance (normal)
> Shutting down instance: further logons disabled
> Tue Mar 23 13:56:20 2004
> Shutting down instance (abort)
> License high water mark = 26
> Instance terminated by USER, pid = 25112
> Tue Mar 23 13:56:37 2004
> Starting ORACLE instance (normal)
> LICENSE_MAX_SESSION = 0
> LICENSE_SESSIONS_WARNING = 0
> SCN scheme 3
> Using log_archive_dest parameter default value
> LICENSE_MAX_USERS = 0
> SYS auditing is disabled
> Starting up ORACLE RDBMS Version: 9.2.0.3.0.
> System parameters with non-default values:
> processes = 400
> timed_statistics = TRUE
> shared_pool_size = 117440512
> large_pool_size = 83886080
> java_pool_size = 33554432
> control_files = /export/home1/oracle/oradata/hswapdb/control01.ctl,
>
> /export/home1/oracle/oradata/hswapdb/control02.ctl,
> /export/home1/oracle/oradata/hswapdb/control03.ctl
> db_block_size = 8192
> db_cache_size = 352321536
> compatible = 9.2.0.0.0
> db_file_multiblock_read_count= 16
> fast_start_mttr_target = 300
> undo_management = AUTO
> undo_tablespace = UNDOTBS1
> undo_retention = 10800
> remote_login_passwordfile= EXCLUSIVE
> db_domain = eygle.com
> instance_name = hswapdb
> dispatchers = (PROTOCOL=TCP) (SERVICE=hswapdbXDB)
> remote_dependencies_mode = SIGNATURE
> job_queue_processes = 10
> hash_join_enabled = TRUE
> background_dump_dest = /export/home1/oracle/admin/hswapdb/bdump
> user_dump_dest = /export/home1/oracle/admin/hswapdb/udump
> core_dump_dest = /export/home1/oracle/admin/hswapdb/cdump
> sort_area_size = 524288
> db_name = hswapdb
> open_cursors = 300
> star_transformation_enabled= FALSE
> parallel_automatic_tuning= TRUE
> query_rewrite_enabled = FALSE
> pga_aggregate_target = 154140672
> aq_tm_processes = 1
> PMON started with pid=2
> DBW0 started with pid=3
> LGWR started with pid=4
> CKPT started with pid=5
> SMON started with pid=6
> RECO started with pid=7
> CJQ0 started with pid=8
> QMN0 started with pid=9
> Tue Mar 23 13:56:42 2004
> starting up 1 shared server(s) ...
> Tue Mar 23 13:56:42 2004
> starting up 1 dispatcher(s) for network address '(ADDRESS=(PARTIAL=YES)(PROTOCOL=TCP))'...
> Tue Mar 23 13:56:43 2004
> ALTER DATABASE MOUNT
> Tue Mar 23 13:56:47 2004
> Successful mount of redo thread 1, with mount id 3253076635.
> Tue Mar 23 13:56:47 2004
> Database mounted in Exclusive Mode.
> Completed: ALTER DATABASE MOUNT
> Tue Mar 23 13:56:47 2004
> Current log# 2 seq# 2136 mem# 0: /export/home1/oracle/oradata/hswapdb/redo02.log
> Successful open of redo thread 1.
> Tue Mar 23 12:24:54 2004
> SMON: enabling cache recovery
> Tue Mar 23 12:24:56 2004
> Undo Segment 1 Onlined
> Undo Segment 2 Onlined
> Undo Segment 3 Onlined
> Undo Segment 4 Onlined
> Undo Segment 5 Onlined
> Undo Segment 6 Onlined
> Undo Segment 7 Onlined
> Undo Segment 8 Onlined
> Undo Segment 9 Onlined
> Undo Segment 10 Onlined
> Successfully onlined Undo Tablespace 1.
> Tue Mar 23 12:24:56 2004
> SMON: enabling tx recovery
> Tue Mar 23 12:24:56 2004
> Database Characterset is ZHS16GBK
> Tue Mar 23 12:25:01 2004
> SMON: Parallel transaction recovery tried
> Tue Mar 23 12:25:01 2004
> replication_dependency_tracking turned off (no async multimaster replication found)
> Completed: ALTER DATABASE OPEN
> Tue Mar 23 12:28:26 2004
> / OracleOEM / ALTER DATABASE DATAFILE '/export/home1/oracle/oradata/hswapdb/users01.dbf' RESIZE 2501760K
> Tue Mar 23 12:28:26 2004
> ORA-3297 signalled during: / OracleOEM / ALTER DATABASE DATAFILE '/export/h...
> Tue Mar 23 12:28:32 2004
> / OracleOEM / ALTER DATABASE DATAFILE '/export/home1/oracle/oradata/hswapdb/users01.dbf' RESIZE 2501760K
> ORA-3297 signalled during: / OracleOEM / ALTER DATABASE DATAFILE '/export/h...
> Tue Mar 23 12:28:53 2004
> / OracleOEM / ALTER DATABASE DATAFILE '/export/home1/oracle/oradata/hswapdb/users01.dbf' RESIZE 3501760K
> Tue Mar 23 12:28:53 2004
> ORA-3297 signalled during: / OracleOEM */ ALTER DATABASE DATAFILE '/export/h...
> Tue Mar 23 13:40:45 2004
> skgpspawn failed:category = 27142, depinfo = 12, op = fork, loc = skgpspawn3
> skgpspawn failed:category = 27142, depinfo = 12, op = fork, loc = skgpspawn3
> skgpspawn failed:category = 27142, depinfo = 12, op = fork, loc = skgpspawn3
> skgpspawn failed:category = 27142, depinfo = 12, op = fork, loc = skgpspawn3
> skgpspawn failed:category = 27142, depinfo = 12, op = fork, loc = skgpspawn3
> skgpspawn failed:category = 27142, depinfo = 12, op = fork, loc = skgpspawn3
> skgpspawn failed:category = 27142, depinfo = 11, op = fork, loc = skgpspawn5
> skgpspawn failed:category = 27142, depinfo = 12, op = fork, loc = skgpspawn3
> skgpspawn failed:category = 27142, depinfo = 12, op = fork, loc = skgpspawn3
> Tue Mar 23 13:42:02 2004
> skgpspawn failed:category = 27142, depinfo = 12, op = fork, loc = skgpspawn3
> :q
发现数据库多次重起,并记录了部分错误信息
该提示说明数据库无法 spawn a new session.
quote Yong Huang 's comment:
The number in "skgpspawn failed:category = 27142" is probably ORA error:
$ oerr ora 27142
27142, 0000, "could not create new process"
// *Cause: OS system call
// *Action: check errno and if possible increase the number of processes
OSD (OS-dependent) errors are almost always shown as an skg... error (probably means "system, kernel generic").
I don't know what "depinfo = 12" means.
4.尝试连接数据库
收到错误信息,无法连接数据库
>
> $ sqlplus "/ as sysdba"
>
> SQLPlus: Release 9.2.0.3.0 - Production on 星期二 3月 23 14:14:06 2004
>
> Copyright (c) 1982, 2002, Oracle Corporation. All rights reserved.
>
> ERROR:
> ORA-12540: TNS: 超出内部限制
>
>
> 请输入用户名:
> ERROR:
> ORA-12540: TNS: 超出内部限制
>
>
> 请输入用户名:
> ERROR:
> ORA-12540: TNS: 超出内部限制
>
>
> SP2-0157: 在3次尝试之后无法 CONNECT 到 ORACLE, 退出 SQLPlus
>
>
>
内部限制超过,通常说明某些系统资源不足.
5.检查监听器
发现部分连接被拒绝
> /export/home1/oracle>lsnrctl services
>
> LSNRCTL for Solaris: Version 9.2.0.3.0 - Production on 23-3月 -2004 14:37:23
>
> Copyright (c) 1991, 2002, Oracle Corporation. All rights reserved.
>
> 正在连接到 (DESCRIPTION=(ADDRESS=(PROTOCOL=IPC)(KEY=EXTPROC)))
> 服务摘要..
> 服务 "PLSExtProc" 包含 1 个例程。
> 例程 "PLSExtProc", 状态 UNKNOWN, 包含此服务的 1 个处理程序...
> 处理程序:
> "DEDICATED" 已建立:0 已被拒绝:0
> LOCAL SERVER
> 服务 "hswapdb.eygle.com" 包含 2 个例程。
> 例程 "hswapdb", 状态 UNKNOWN, 包含此服务的 1 个处理程序...
> 处理程序:
> "DEDICATED" 已建立:6 已被拒绝:0
> LOCAL SERVER
> 例程 "hswapdb", 状态 READY, 包含此服务的 1 个处理程序...
> 处理程序:
> "DEDICATED" 已建立:21 已拒绝:6 状态:ready
> LOCAL SERVER
> 服务 "hswapdbXDB.eygle.com" 包含 1 个例程。
> 例程 "hswapdb", 状态 READY, 包含此服务的 1 个处理程序...
> 处理程序:
> "D000" 已建立:0 已被拒绝:0 当前: 0 最大: 972 状态: ready
> DISPATCHER
1<machine: 25839="" pid:="" wapplatform,="">
2> (ADDRESS=(PROTOCOL=tcp)(HOST=wapplatform)(PORT=32869))
3> 命令执行成功
4>
5>
6>
7
8---
9
10在listener.log中找到了相关错误信息
11
12> 23-3\324\302 -2004 12:19:40 * (CONNECT_DATA=(SID=hswapdb)(CID=(PROGRAM=C:\WINNT\Microsoft.NET\Framework\v1.1.4322\aspnet_wp.e
13> xe)(HOST=SWAN)(USER=SYSTEM))) * (ADDRESS=(PROTOCOL=tcp)(HOST=172.16.30.125)(PORT=1291)) * establish * hswapdb * 12500
14> TNS-12500: TNS\243\272\274\340\314\375\306\367\316\264\304\334\306\364\266\257\327\250\323\303\265\304\267\376\316\361\306\36
15> 7\275\370\263\314
16> TNS-12540: TNS\243\272\263\254\263\366\304\332\262\277\274\253\317\336\317\336\326\306
17> TNS-12560: TNS: \320\255\322\351\312\312\305\344\306\367\264\355\316\363
18> TNS-00510: \263\254\263\366\304\332\262\277\274\253\317\336\317\336\326\306
19> Solaris Error: 12: Not enough space
20> 23-3\324\302 -2004 12:19:50 * (CONNECT_DATA=(SID=hswapdb)(CID=(PROGRAM=C:\Program Files\PLSQL Developer\PLSQLDev.exe)(HOST=SW
21> AN)(USER=Administrator))) * (ADDRESS=(PROTOCOL=tcp)(HOST=172.16.30.125)(PORT=1292)) * establish * hswapdb * 12500
22> TNS-12500: TNS\243\272\274\340\314\375\306\367\316\264\304\334\306\364\266\257\327\250\323\303\265\304\267\376\316\361\306\36
23> 7\275\370\263\314
24> TNS-12540: TNS\243\272\263\254\263\366\304\332\262\277\274\253\317\336\317\336\326\306
25> TNS-12560: TNS: \320\255\322\351\312\312\305\344\306\367\264\355\316\363
26> TNS-00510: \263\254\263\366\304\332\262\277\274\253\317\336\317\336\326\306
27> Solaris Error: 12: Not enough space
28>
29> /export/home1/oracle/app/network/log>grep -w 12 /usr/include/sys/errno.h
30> #define ENOMEM 12 /* Not enough core
31>
32>
33>
34
35---
36
37quote Yong Huang 's comment:
38
39$ grep -w 12 /usr/include/sys/errno.h
40#define ENOMEM 12 /* Not enough core */
41
42Here "core" means memory, including real RAM memory and swap space.
43
44
456.退出Oracle用户检查
46
47检查系统日志信息,发现大量失败的su操作
48有swap区不足的报告
49
50>
51> /export/home1/oracle/admin/hswapdb/bdump>exit
52> >
53> wapplatform:/>dmesg
54>
55> 2004年03月23日 星期二 14时00分32秒 CST
56> Mar 22 22:52:36 wapplatform elfexec: [ID 700856 kern.notice] ps: Cannot find ^?ELF^A^B^A
57> Mar 22 22:53:00 wapplatform ufs: [ID 845546 kern.notice] NOTICE: alloc: /export/home1: file system full
58> Mar 22 22:53:09 wapplatform elfexec: [ID 700856 kern.notice] w: Cannot find ^?ELF^A^B^A
59> Mar 22 22:53:53 wapplatform last message repeated 4 times
60> Mar 22 22:56:28 wapplatform elfexec: [ID 700856 kern.notice] ipnat: Cannot find ^?ELF^B^B^A
61> Mar 22 22:58:00 wapplatform ufs: [ID 845546 kern.notice] NOTICE: alloc: /export/home1: file system full
62> Mar 22 22:59:54 wapplatform elfexec: [ID 700856 kern.notice] ipnat: Cannot find ^?ELF^B^B^A
63> Mar 22 23:02:26 wapplatform ufs: [ID 845546 kern.notice] NOTICE: alloc: /export/home1: file system full
64> Mar 22 23:03:00 wapplatform last message repeated 1 time
65> Mar 22 23:08:00 wapplatform ufs: [ID 845546 kern.notice] NOTICE: alloc: /export/home1: file system full
66> Mar 22 23:08:34 wapplatform elfexec: [ID 700856 kern.notice] ipnat: Cannot find ^?ELF^B^B^A
67> Mar 22 23:10:27 wapplatform last message repeated 3 times
68> Mar 22 23:11:49 wapplatform elfexec: [ID 700856 kern.notice] ipnat: Cannot find ^?ELF^B^B^A
69> Mar 22 23:11:52 wapplatform last message repeated 1 time
70> Mar 22 23:13:01 wapplatform ufs: [ID 845546 kern.notice] NOTICE: alloc: /export/home1: file system full
71> Mar 22 23:18:01 wapplatform last message repeated 1 time
72> Mar 22 23:23:01 wapplatform ufs: [ID 845546 kern.notice] NOTICE: alloc: /export/home1: file system full
73> Mar 22 23:28:01 wapplatform last message repeated 1 time
74> Mar 22 23:33:01 wapplatform ufs: [ID 845546 kern.notice] NOTICE: alloc: /export/home1: file system full
75> Mar 22 23:38:01 wapplatform last message repeated 1 time
76> Mar 22 23:43:01 wapplatform ufs: [ID 845546 kern.notice] NOTICE: alloc: /export/home1: file system full
77> Mar 22 23:48:01 wapplatform last message repeated 1 time
78> Mar 22 23:53:01 wapplatform ufs: [ID 845546 kern.notice] NOTICE: alloc: /export/home1: file system full
79> Mar 22 23:58:01 wapplatform last message repeated 1 time
80> Mar 23 00:00:00 wapplatform ufs: [ID 213553 kern.notice] NOTICE: realloccg /export/home1: file system full
81> Mar 23 00:00:00 wapplatform sendmail[3075]: [ID 702911 mail.crit] My unqualified host name (wapplatform) unknown; sleeping
82>
83> for retry
84> Mar 23 00:01:00 wapplatform sendmail[3075]: [ID 702911 mail.alert] unable to qualify my own domain name (wapplatform) --
85>
86> using short name
87> Mar 23 00:02:36 wapplatform ufs: [ID 845546 kern.notice] NOTICE: alloc: /export/home1: file system full
88> Mar 23 00:03:02 wapplatform last message repeated 1 time
89> Mar 23 00:08:02 wapplatform ufs: [ID 845546 kern.notice] NOTICE: alloc: /export/home1: file system full
90> ....
91>
92> Mar 23 10:18:15 wapplatform ufs: [ID 845546 kern.notice] NOTICE: alloc: /export/home1: file system full
93> Mar 23 10:20:41 wapplatform ufs: [ID 213553 kern.notice] NOTICE: realloccg /export/home1: file system full
94> Mar 23 10:20:47 wapplatform last message repeated 1 time
95> Mar 23 10:23:15 wapplatform ufs: [ID 845546 kern.notice] NOTICE: alloc: /export/home1: file system full
96> Mar 23 10:24:38 wapplatform ufs: [ID 213553 kern.notice] NOTICE: realloccg /export/home1: file system full
97> Mar 23 10:24:43 wapplatform last message repeated 1 time
98> Mar 23 10:24:55 wapplatform ufs: [ID 213553 kern.notice] NOTICE: realloccg /export/home1: file system full
99> Mar 23 10:25:06 wapplatform last message repeated 2 times
100> Mar 23 11:09:31 wapplatform genunix: [ID 470503 kern.warning] WARNING: Sorry, no swap space to grow stack for pid 3118 (su)
101> Mar 23 11:09:39 wapplatform genunix: [ID 470503 kern.warning] WARNING: Sorry, no swap space to grow stack for pid 3121 (su)
102> Mar 23 11:10:48 wapplatform genunix: [ID 470503 kern.warning] WARNING: Sorry, no swap space to grow stack for pid 3137 (su)
103> Mar 23 11:18:02 wapplatform sshd[3620]: [ID 800047 auth.error] error: grantpt: Not enough space
104> Mar 23 11:18:02 wapplatform sshd[3620]: [ID 800047 auth.error] error: session_pty_req: session 0 alloc failed
105> Mar 23 11:18:43 wapplatform genunix: [ID 470503 kern.warning] WARNING: Sorry, no swap space to grow stack for pid 3636 (su)
106> Mar 23 11:19:47 wapplatform genunix: [ID 470503 kern.warning] WARNING: Sorry, no swap space to grow stack for pid 3672 (su)
107> Mar 23 11:20:20 wapplatform genunix: [ID 470503 kern.warning] WARNING: Sorry, no swap space to grow stack for pid 3694 (su)
108> Mar 23 11:22:23 wapplatform genunix: [ID 470503 kern.warning] WARNING: Sorry, no swap space to grow stack for pid 3736 (sshd)
109> Mar 23 11:23:17 wapplatform tmpfs: [ID 518458 kern.warning] WARNING: /tmp: File system full, swap space limit exceeded
110> Mar 23 11:23:40 wapplatform genunix: [ID 470503 kern.warning] WARNING: Sorry, no swap space to grow stack for pid 3804 (su)
111> Mar 23 11:23:40 wapplatform last message repeated 8 times
112> Mar 23 11:23:56 wapplatform genunix: [ID 470503 kern.warning] WARNING: Sorry, no swap space to grow stack for pid 3806 (ps)
113> Mar 23 11:23:56 wapplatform last message repeated 12 times
114> Mar 23 11:24:01 wapplatform genunix: [ID 470503 kern.warning] WARNING: Sorry, no swap space to grow stack for pid 3808 (w)
115> Mar 23 11:24:01 wapplatform last message repeated 8 times
116> Mar 23 13:40:56 wapplatform su: [ID 810491 auth.crit] 'su root' failed for root on /dev/pts/2
117> Mar 23 13:46:26 wapplatform genunix: [ID 470503 kern.warning] WARNING: Sorry, no swap space to grow stack for pid 24888
118>
119> (sqlplus)
120> Mar 23 13:49:18 wapplatform su: [ID 810491 auth.crit] 'su oracle' failed for root on /dev/pts/6
121> Mar 23 13:54:03 wapplatform genunix: [ID 470503 kern.warning] WARNING: Sorry, no swap space to grow stack for pid 25035 (su)
122> Mar 23 13:54:08 wapplatform genunix: [ID 470503 kern.warning] WARNING: Sorry, no swap space to grow stack for pid 25036 (su)
123
124---
125
126现在基本可以判断是交换区的问题,当然和Oracle SGA设置有关.
127
1287.检查系统内存及交换区使用
129
130
131>
132> /export/home1/oracle/admin/hswapdb/bdump>exit
133> >
134> wapplatform:/>dmesg
135>
136> 2004年03月23日 星期二 14时00分32秒 CST
137> Mar 22 22:52:36 wapplatform elfexec: [ID 700856 kern.notice] ps: Cannot find ^?ELF^A^B^A
138> Mar 22 22:53:00 wapplatform ufs: [ID 845546 kern.notice] NOTICE: alloc: /export/home1: file system full
139> Mar 22 22:53:09 wapplatform elfexec: [ID 700856 kern.notice] w: Cannot find ^?ELF^A^B^A
140> Mar 22 22:53:53 wapplatform last message repeated 4 times
141> Mar 22 22:56:28 wapplatform elfexec: [ID 700856 kern.notice] ipnat: Cannot find ^?ELF^B^B^A
142> Mar 22 22:58:00 wapplatform ufs: [ID 845546 kern.notice] NOTICE: alloc: /export/home1: file system full
143> Mar 22 22:59:54 wapplatform elfexec: [ID 700856 kern.notice] ipnat: Cannot find ^?ELF^B^B^A
144> Mar 22 23:02:26 wapplatform ufs: [ID 845546 kern.notice] NOTICE: alloc: /export/home1: file system full
145> Mar 22 23:03:00 wapplatform last message repeated 1 time
146> Mar 22 23:08:00 wapplatform ufs: [ID 845546 kern.notice] NOTICE: alloc: /export/home1: file system full
147> Mar 22 23:08:34 wapplatform elfexec: [ID 700856 kern.notice] ipnat: Cannot find ^?ELF^B^B^A
148> Mar 22 23:10:27 wapplatform last message repeated 3 times
149> Mar 22 23:11:49 wapplatform elfexec: [ID 700856 kern.notice] ipnat: Cannot find ^?ELF^B^B^A
150> Mar 22 23:11:52 wapplatform last message repeated 1 time
151> Mar 22 23:13:01 wapplatform ufs: [ID 845546 kern.notice] NOTICE: alloc: /export/home1: file system full
152> Mar 22 23:18:01 wapplatform last message repeated 1 time
153> Mar 22 23:23:01 wapplatform ufs: [ID 845546 kern.notice] NOTICE: alloc: /export/home1: file system full
154> Mar 22 23:28:01 wapplatform last message repeated 1 time
155> Mar 22 23:33:01 wapplatform ufs: [ID 845546 kern.notice] NOTICE: alloc: /export/home1: file system full
156> Mar 22 23:38:01 wapplatform last message repeated 1 time
157> Mar 22 23:43:01 wapplatform ufs: [ID 845546 kern.notice] NOTICE: alloc: /export/home1: file system full
158> Mar 22 23:48:01 wapplatform last message repeated 1 time
159> Mar 22 23:53:01 wapplatform ufs: [ID 845546 kern.notice] NOTICE: alloc: /export/home1: file system full
160> Mar 22 23:58:01 wapplatform last message repeated 1 time
161> Mar 23 00:00:00 wapplatform ufs: [ID 213553 kern.notice] NOTICE: realloccg /export/home1: file system full
162> Mar 23 00:00:00 wapplatform sendmail[3075]: [ID 702911 mail.crit] My unqualified host name (wapplatform) unknown; sleeping
163>
164> for retry
165> Mar 23 00:01:00 wapplatform sendmail[3075]: [ID 702911 mail.alert] unable to qualify my own domain name (wapplatform) --
166>
167> using short name
168> Mar 23 00:02:36 wapplatform ufs: [ID 845546 kern.notice] NOTICE: alloc: /export/home1: file system full
169> Mar 23 00:03:02 wapplatform last message repeated 1 time
170> Mar 23 00:08:02 wapplatform ufs: [ID 845546 kern.notice] NOTICE: alloc: /export/home1: file system full
171> ....
172>
173> Mar 23 10:18:15 wapplatform ufs: [ID 845546 kern.notice] NOTICE: alloc: /export/home1: file system full
174> Mar 23 10:20:41 wapplatform ufs: [ID 213553 kern.notice] NOTICE: realloccg /export/home1: file system full
175> Mar 23 10:20:47 wapplatform last message repeated 1 time
176> Mar 23 10:23:15 wapplatform ufs: [ID 845546 kern.notice] NOTICE: alloc: /export/home1: file system full
177> Mar 23 10:24:38 wapplatform ufs: [ID 213553 kern.notice] NOTICE: realloccg /export/home1: file system full
178> Mar 23 10:24:43 wapplatform last message repeated 1 time
179> Mar 23 10:24:55 wapplatform ufs: [ID 213553 kern.notice] NOTICE: realloccg /export/home1: file system full
180> Mar 23 10:25:06 wapplatform last message repeated 2 times
181> Mar 23 11:09:31 wapplatform genunix: [ID 470503 kern.warning] WARNING: Sorry, no swap space to grow stack for pid 3118 (su)
182> Mar 23 11:09:39 wapplatform genunix: [ID 470503 kern.warning] WARNING: Sorry, no swap space to grow stack for pid 3121 (su)
183> Mar 23 11:10:48 wapplatform genunix: [ID 470503 kern.warning] WARNING: Sorry, no swap space to grow stack for pid 3137 (su)
184> Mar 23 11:18:02 wapplatform sshd[3620]: [ID 800047 auth.error] error: grantpt: Not enough space
185> Mar 23 11:18:02 wapplatform sshd[3620]: [ID 800047 auth.error] error: session_pty_req: session 0 alloc failed
186> Mar 23 11:18:43 wapplatform genunix: [ID 470503 kern.warning] WARNING: Sorry, no swap space to grow stack for pid 3636 (su)
187> Mar 23 11:19:47 wapplatform genunix: [ID 470503 kern.warning] WARNING: Sorry, no swap space to grow stack for pid 3672 (su)
188> Mar 23 11:20:20 wapplatform genunix: [ID 470503 kern.warning] WARNING: Sorry, no swap space to grow stack for pid 3694 (su)
189> Mar 23 11:22:23 wapplatform genunix: [ID 470503 kern.warning] WARNING: Sorry, no swap space to grow stack for pid 3736 (sshd)
190> Mar 23 11:23:17 wapplatform tmpfs: [ID 518458 kern.warning] WARNING: /tmp: File system full, swap space limit exceeded
191> Mar 23 11:23:40 wapplatform genunix: [ID 470503 kern.warning] WARNING: Sorry, no swap space to grow stack for pid 3804 (su)
192> Mar 23 11:23:40 wapplatform last message repeated 8 times
193> Mar 23 11:23:56 wapplatform genunix: [ID 470503 kern.warning] WARNING: Sorry, no swap space to grow stack for pid 3806 (ps)
194> Mar 23 11:23:56 wapplatform last message repeated 12 times
195> Mar 23 11:24:01 wapplatform genunix: [ID 470503 kern.warning] WARNING: Sorry, no swap space to grow stack for pid 3808 (w)
196> Mar 23 11:24:01 wapplatform last message repeated 8 times
197> Mar 23 13:40:56 wapplatform su: [ID 810491 auth.crit] 'su root' failed for root on /dev/pts/2
198> Mar 23 13:46:26 wapplatform genunix: [ID 470503 kern.warning] WARNING: Sorry, no swap space to grow stack for pid 24888
199>
200> (sqlplus)
201> Mar 23 13:49:18 wapplatform su: [ID 810491 auth.crit] 'su oracle' failed for root on /dev/pts/6
202> Mar 23 13:54:03 wapplatform genunix: [ID 470503 kern.warning] WARNING: Sorry, no swap space to grow stack for pid 25035 (su)
203> Mar 23 13:54:08 wapplatform genunix: [ID 470503 kern.warning] WARNING: Sorry, no swap space to grow stack for pid 25036 (su)
204
205---
206
207
208现在基本可以判断是交换区的问题,当然和Oracle SGA设置有关.
209
2107.检查系统内存及交换区使用
211
212
213
214>
215> $ top
216>
217> last pid: 25456; load averages: 0.67, 0.70, 0.69
218>
219> 14:10:03
220> 93 processes: 91 sleeping, 2 on cpu
221> CPU states: 72.7% idle, 14.9% user, 2.7% kernel, 9.7% iowait, 0.0% swap
222> Memory: 1024M real, 34M free, 752M swap in use, 10M swap free
223>
224> PID USERNAME THR PRI NICE SIZE RES STATE TIME CPU COMMAND
225> 25199 oracle 1 40 0 674M 631M cpu/2 8:03 16.32% oracle
226> 25209 oracle 1 30 0 675M 630M sleep 0:03 0.13% oracle
227> 25159 oracle 1 48 0 674M 628M sleep 0:03 0.06% oracle
228> 25384 oracle 1 58 0 2632K 1736K cpu/0 0:01 0.05% top
229> 25145 oracle 143 58 0 682M 630M sleep 0:01 0.03% oracle
230> 25446 oracle 1 58 0 674M 628M sleep 0:00 0.03% oracle
231> 25149 oracle 15 58 0 682M 626M sleep 0:00 0.02% oracle
232> 25075 oracle 1 48 0 17M 7208K sleep 0:00 0.01% tnslsnr
233> 25151 oracle 11 58 0 676M 624M sleep 0:00 0.01% oracle
234> 25366 oracle 1 10 0 674M 628M sleep 0:00 0.00% oracle
235> 25356 oracle 1 18 0 674M 628M sleep 0:00 0.00% oracle
236> 25360 oracle 1 20 0 674M 628M sleep 0:00 0.00% oracle
237> 25364 oracle 1 20 0 674M 628M sleep 0:00 0.00% oracle
238> 25362 oracle 1 20 0 674M 628M sleep 0:00 0.00% oracle
239> 25330 oracle 1 28 0 674M 628M sleep 0:00 0.00% oracle
240>
241>
242
243---
244
245发现物理内存仅为1G,free部分为34M,交换区使用了752M,仅10M free
246系统内存严重不足,Swap区不足
247
2488\. 检查数据库的SGA设置
249
250发现SGA设置为: 622299344 bytes
251接近600M
252
253
254>
255>
256> > wapplatform:/>su - oracle
257> > Sun Microsystems Inc. SunOS 5.8 Generic Patch October 2001
258> > You have new mail.
259> > /export/home1/oracle>sqlplus "/ as sysdba"
260>
261> SQL*Plus: Release 9.2.0.3.0 - Production on 星期二 3月 23 14:02:30 2004
262>
263> Copyright (c) 1982, 2002, Oracle Corporation. All rights reserved.
264>
265>
266> 连接到:
267> Oracle9i Enterprise Edition Release 9.2.0.3.0 - 64bit Production
268> With the Partitioning, OLAP and Oracle Data Mining options
269> JServer Release 9.2.0.3.0 - Production
270>
271> SQL> show sga
272>
273> Total System Global Area 622299344 bytes
274> Fixed Size 731344 bytes
275> Variable Size 268435456 bytes
276> Database Buffers 352321536 bytes
277> Redo Buffers 811008 bytes
278> SQL>
279
280---
281
282
283
284对于RAM小于1G的系统,Dedicated模式下,Oracle的SGA一般不应超过1/2物理内存.
285
2869.第一步调整
287减小SGA,为系统保留足够的内存.
288
28910.增加swap区
290
291
292>
293>
294> >
295> > wapplatform:/>df -k
296> > 文件系统 千字节 用了 可用 容量 挂接在
297> > /dev/dsk/c0t1d0s0 3099093 105421 2931691 4% /
298> > /dev/dsk/c0t2d0s0 10325760 8359637 1862866 82% /usr
299> > /proc 0 0 0 0% /proc
300> > fd 0 0 0 0% /dev/fd
301> > mnttab 0 0 0 0% /etc/mnttab
302> > /dev/dsk/c0t1d0s3 1018382 285914 671366 30% /var
303> > swap 3904 24 3880 1% /var/run
304> > swap 3936 56 3880 2% /tmp
305> > /dev/dsk/c0t1d0s5 1671823 459202 1162467 29% /opt
306> > /dev/dsk/c0t2d0s7 7087473 6068462 948137 87% /export/home
307> > /dev/dsk/c2t1d0s7 17413250 15900222 1338896 93% /export/home2
308> > /dev/dsk/c0t3d0s7 17413250 13749782 3489336 80% /export/home1
309> > /dev/dsk/c0t1d0s1 771110 382410 334723 54% /usr/openwin
310> > /export/home/wapgw/luke
311> > 7087473 6068462 948137 87% /home/wap
312>
313> wapplatform:/var/swap>cd /export/home1
314> wapplatform:/export/home1>ls
315> TT_DB lost+found oracle oracli9
316> wapplatform:/export/home1>mkdir swap
317> wapplatform:/export/home1>cd swap
318> wapplatform:/export/home1/swap>mkfile -v 1g swapfile1
319> swapfile1 1073741824 bytes
320> wapplatform:/export/home1/swap>id
321> uid=0(root) gid=1(other)
322> wapplatform:/export/home1/swap>swap -a /export/home1/swap/swapfile1
323> wapplatform:/export/home1/swap>swap -s
324> 总数:分配了 623160k 字节 + 保留 162704k = 已使用 785864k,1010936k 可用
325
326---
327
328
32911.连接测试
330
331系统恢复正常,问题解决
332
333>
334>
335> >
336> > wapplatform:/export/home1/swap>su - oracle
337> > Sun Microsystems Inc. SunOS 5.8 Generic Patch October 2001
338> > You have new mail.
339> > /export/home1/oracle>sqlplus "/ as sysdba"
340>
341> SQL*Plus: Release 9.2.0.3.0 - Production on 星期四 3月 25 11:56:28 2004
342>
343> Copyright (c) 1982, 2002, Oracle Corporation. All rights reserved.
344>
345>
346> 连接到:
347> Oracle9i Enterprise Edition Release 9.2.0.3.0 - 64bit Production
348> With the Partitioning, OLAP and Oracle Data Mining options
349> JServer Release 9.2.0.3.0 - Production
350>
351> SQL> exit
352> 从Oracle9i Enterprise Edition Release 9.2.0.3.0 - 64bit Production
353> With the Partitioning, OLAP and Oracle Data Mining options
354> JServer Release 9.2.0.3.0 - Production中断开
355> /export/home1/oracle>top
356>
357> last pid: 5372; load averages: 0.25, 0.22, 0.29
358>
359> 11:57:58
360> 148 processes: 137 sleeping, 9 zombie, 2 on cpu
361> CPU states: 98.8% idle, 0.2% user, 0.7% kernel, 0.2% iowait, 0.0% swap
362> Memory: 1024M real, 17M free, 824M swap in use, 934M swap free
363>
364> PID USERNAME THR PRI NICE SIZE RES STATE TIME CPU COMMAND
365> 5363 root 1 58 0 2680K 1736K sleep 0:00 0.24% top
366> 5370 oracle 1 58 0 514M 469M sleep 0:00 0.18% oracle
367> 5366 oracle 1 28 0 514M 469M sleep 0:00 0.11% oracle
368> 5341 oracle 1 58 0 2680K 1736K cpu/2 0:00 0.10% top
369> 5372 oracle 1 48 0 61M 3288K cpu/3 0:00 0.06% oracle
370> 1288 oracle 1 48 0 514M 468M sleep 5:33 0.05% oracle
371> 607 root 12 48 0 2768K 2312K sleep 1:48 0.03% mibiisa
372> 25075 oracle 1 48 0 17M 7208K sleep 0:16 0.02% tnslsnr
373> 1278 oracle 15 58 0 522M 466M sleep 0:49 0.02% oracle
374> 374 root 11 53 0 3504K 2888K sleep 0:16 0.01% nscd
375> 1280 oracle 19 58 0 518M 466M sleep 0:28 0.00% oracle
376> 5361 root 1 46 0 1024K 680K sleep 0:00 0.00% sleep
377> 5362 root 1 46 0 1024K 680K sleep 0:00 0.00% sleep
378> 5469 root 1 36 0 1952K 1176K sleep 30:09 0.00% monithttp
379> 4167 oracle 1 40 0 515M 471M sleep 29:38 0.00% oracle
380
381---
382
383
384问题总结:
385
386Oracle数据库问题的解决从来就离不开操作系统
387
388很多时候我们必须通过操作系统一级的手段来诊断并解决问题.
389
390关于操作系统
391
392一般Swap区的推荐值为2XRAM
393如果Ram很大,不一定非要把Swap设置为2xSwap
394但是通常至少设置Swap = Ram
395
396如果Swap区过小,在系统繁忙期间
397产生大量交换无法换到磁盘,就会出现问题.
398如本案例就是这样。
399
400另外,如果系统Ram较小
401通常设置SGA < 1/2 Ram
402
403要为Server process及OS保留足够的内存空间.</machine:>