Oracle诊断案例-Spfile案例一则
link:
http://www.eygle.com/case/spfile.htm
情况说明:
系统:SUN Solaris8
数据库版本:9203
问题描述:工程人员报告,数据库在重新启动时无法正常启动.检查发现UNDO表空间丢失.
问题诊断及解决过程如下:
1. 登陆系统检查alert.log文件
检查alert.log文件是通常是我们诊断数据库问题的第一步
> SunOS 5.8
>
> login: root
> Password:
> Last login: Thu Apr 1 11:39:16 from 10.123.7.162
> Sun Microsystems Inc. SunOS 5.8 Generic Patch October 2001
> You have new mail.
> # su - oracle
> bash-2.03$ cd $ORACLE_BASE/admin/*/bdump
> bash-2.03$ vi *.log
>
> "alert_gzhs.log" 7438 lines, 283262 characters
> Sat Feb 7 20:30:06 2004
> Starting ORACLE instance (normal)
> LICENSE_MAX_SESSION = 0
> LICENSE_SESSIONS_WARNING = 0
> SCN scheme 3
> Using log_archive_dest parameter default value
> LICENSE_MAX_USERS = 0
> SYS auditing is disabled
> Starting up ORACLE RDBMS Version: 9.2.0.3.0.
> System parameters with non-default values:
> processes = 150
> timed_statistics = TRUE
> shared_pool_size = 1157627904
> large_pool_size = 16777216
> java_pool_size = 637534208
> control_files = /u01/oradata/gzhs/control01.ctl,
> /u02/oradata/gzhs/control02.ctl,
> /u03/oradata/gzhs/control03.ctl
> db_block_size = 8192
> db_cache_size = 2516582400
> compatible = 9.2.0.0.0
> log_archive_start = TRUE
> log_archive_dest_1 = LOCATION=/u06/oradata/gzhs/arch
> log_archive_format = %t_%s.dbf
> db_file_multiblock_read_count= 16
> fast_start_mttr_target = 300
> undo_management = AUTO
> undo_tablespace = UNDOTBS1
> undo_retention = 10800
> remote_login_passwordfile= EXCLUSIVE
> db_domain =
> instance_name = gzhs
> dispatchers = (PROTOCOL=TCP) (SERVICE=gzhsXDB)
> job_queue_processes = 10
> hash_join_enabled = TRUE
> background_dump_dest = /oracle/admin/gzhs/bdump
> user_dump_dest = /oracle/admin/gzhs/udump
> core_dump_dest = /oracle/admin/gzhs/cdump
> sort_area_size = 524288
> db_name = gzhs
> open_cursors = 300
> star_transformation_enabled= FALSE
> query_rewrite_enabled = FALSE
> pga_aggregate_target = 838860800
> aq_tm_processes = 1
> PMON started with pid=2
> DBW0 started with pid=3
> LGWR started with pid=4
> CKPT started with pid=5
> SMON started with pid=6
> "alert_gzhs.log" 7438 lines, 283262 characters
> USER: terminating instance due to error 30012
> Instance terminated by USER, pid = 26433
> ORA-1092 signalled during: ALTER DATABASE OPEN...
> Thu Apr 1 11:11:08 2004
> Starting ORACLE instance (normal)
> LICENSE_MAX_SESSION = 0
> LICENSE_SESSIONS_WARNING = 0
> SCN scheme 3
> Using log_archive_dest parameter default value
> LICENSE_MAX_USERS = 0
> SYS auditing is disabled
> Starting up ORACLE RDBMS Version: 9.2.0.3.0.
> System parameters with non-default values:
> processes = 150
> timed_statistics = TRUE
> shared_pool_size = 1157627904
> large_pool_size = 16777216
> java_pool_size = 637534208
> control_files = /u01/oradata/gzhs/control01.ctl, /u02/oradata/gzhs/control02.ctl, /u03/oradata/gzhs/control03.ctl
> db_block_size = 8192
> db_cache_size = 2516582400
> compatible = 9.2.0.0.0
> log_archive_start = TRUE
> log_archive_dest_1 = LOCATION=/u06/oradata/gzhs/arch
> log_archive_format = %t_%s.dbf
> db_file_multiblock_read_count= 16
> fast_start_mttr_target = 300
> undo_management = AUTO
> undo_tablespace = UNDOTBS1
> undo_retention = 10800
> remote_login_passwordfile= EXCLUSIVE
> db_domain =
> instance_name = gzhs
> dispatchers = (PROTOCOL=TCP) (SERVICE=gzhsXDB)
> job_queue_processes = 10
> hash_join_enabled = TRUE
> background_dump_dest = /oracle/admin/gzhs/bdump
> user_dump_dest = /oracle/admin/gzhs/udump
> core_dump_dest = /oracle/admin/gzhs/cdump
> sort_area_size = 524288
> db_name = gzhs
> open_cursors = 300
> star_transformation_enabled= FALSE
> query_rewrite_enabled = FALSE
> pga_aggregate_target = 838860800
> aq_tm_processes = 1
> PMON started with pid=2
> DBW0 started with pid=3
> LGWR started with pid=4
> CKPT started with pid=5
> SMON started with pid=6
> RECO started with pid=7
> CJQ0 started with pid=8
> Thu Apr 1 11:11:13 2004
> starting up 1 shared server(s) ...
> QMN0 started with pid=9
> Thu Apr 1 11:11:13 2004
> starting up 1 dispatcher(s) for network address '(ADDRESS=(PARTIAL=YES)(PROTOCOL=TCP))'...
> ARCH: STARTING ARCH PROCESSES
> ARC0 started with pid=12
> ARC0: Archival started
> ARC1 started with pid=13
> Thu Apr 1 11:11:13 2004
> ARCH: STARTING ARCH PROCESSES COMPLETE
> Thu Apr 1 11:11:13 2004
> ARC0: Thread not mounted
> Thu Apr 1 11:11:13 2004
> ARC1: Archival started
> ARC1: Thread not mounted
> Thu Apr 1 11:11:14 2004
> ALTER DATABASE MOUNT
> Thu Apr 1 11:11:18 2004
> Successful mount of redo thread 1, with mount id 1088380178.
> Thu Apr 1 11:11:18 2004
> Database mounted in Exclusive Mode.
> Completed: ALTER DATABASE MOUNT
> Thu Apr 1 11:11:27 2004
> alter database open
> Thu Apr 1 11:11:27 2004
> Beginning crash recovery of 1 threads
> Thu Apr 1 11:11:27 2004
> Started first pass scan
> Thu Apr 1 11:11:28 2004
> Completed first pass scan
> 1 redo blocks read, 0 data blocks need recovery
> Thu Apr 1 11:11:28 2004
> Started recovery at
> Thread 1: logseq 177, block 2, scn 0.33104793
> Recovery of Online Redo Log: Thread 1 Group 3 Seq 177 Reading mem 0
> Mem# 0 errs 0: /u01/oradata/gzhs/redo03.log
> Thu Apr 1 11:11:28 2004
> Completed redo application
> Thu Apr 1 11:11:28 2004
> Ended recovery at
> Thread 1: logseq 177, block 3, scn 0.33124794
> 0 data blocks read, 0 data blocks written, 1 redo blocks read
> Crash recovery completed successfully
> Thu Apr 1 11:11:28 2004
> LGWR: Primary database is in CLUSTER CONSISTENT mode
> Thread 1 advanced to log sequence 178
> Thread 1 opened at log sequence 178
> Current log# 1 seq# 178 mem# 0: /u01/oradata/gzhs/redo01.log
> Successful open of redo thread 1.
> Thu Apr 1 11:11:28 2004
> ARC0: Evaluating archive log 3 thread 1 sequence 177
> Thu Apr 1 11:11:28 2004
> ARC0: Beginning to archive log 3 thread 1 sequence 177
> Creating archive destination LOG_ARCHIVE_DEST_1: '/u06/oradata/gzhs/arch/1_177.dbf'
> Thu Apr 1 11:11:28 2004
> SMON: enabling cache recovery
> ARC0: Completed archiving log 3 thread 1 sequence 177
> Thu Apr 1 11:11:28 2004
> Errors in file /oracle/admin/gzhs/udump/gzhs_ora_27781.trc:
> ORA-30012: \263\267\317\373\261\355\277\325\274\344 'UNDOTBS1' \262\273\264\346\324\332\273\362\300\340\320\315\262\273\325\375\310\
> 267
> Thu Apr 1 11:11:28 2004
> Error 30012 happened during db open, shutting down database
> USER: terminating instance due to error 30012
> Instance terminated by USER, pid = 27781
> ORA-1092 signalled during: alter database open...
> :q
>
>
>
> .............
>
在警报日志末尾显示了数据库在Open状态因为错误而异常终止.
2. 尝试重新启动数据库
>
>
> > bash-2.03$ sqlplus "/ as sysdba"
>
> SQL*Plus: Release 9.2.0.3.0 - Production on 星期四 4月 1 11:43:52 2004
>
> Copyright (c) 1982, 2002, Oracle Corporation. All rights reserved.
>
> 已连接到空闲例程。
>
> SQL> startup
> ORACLE 例程已经启动。
>
> Total System Global Area 4364148184 bytes
> Fixed Size 736728 bytes
> Variable Size 1845493760 bytes
> Database Buffers 2516582400 bytes
> Redo Buffers 1335296 bytes
> 数据库装载完毕。
> ORA-01092: ORACLE 例程终止。强行断开连接
>
>
>
> > .............
> >
工程人员报告的问题重现.
3. 检查数据文件
>
>
> > bash-2.03$ cd /u01/ oradata/gzhs
> > bash-2.03$ ls -l
> > total 55702458
> > -rw-r----- 1 oracle dba 1073750016 Apr 1 11:44 UNDOTBS2.dbf
> > -rw-r----- 1 oracle dba 1073750016 Apr 1 11:44 WAP12_BILLINGDETAIL.dbf
> > -rw-r----- 1 oracle dba 1073750016 Apr 1 11:44 WAP12_MAIN.dbf
> > -rw-r----- 1 oracle dba 2097160192 Apr 1 11:44 WAP12_MAIN10.dbf
> > -rw-r----- 1 oracle dba 2097160192 Apr 1 11:44 WAP12_MAIN11.dbf
> > -rw-r----- 1 oracle dba 2097160192 Apr 1 11:44 WAP12_MAIN2.dbf
> > -rw-r----- 1 oracle dba 2097160192 Apr 1 11:44 WAP12_MAIN3.dbf
> > -rw-r----- 1 oracle dba 2097160192 Apr 1 11:44 WAP12_MAIN4.dbf
> > -rw-r----- 1 oracle dba 2097160192 Apr 1 11:44 WAP12_MAIN5.dbf
> > -rw-r----- 1 oracle dba 2097160192 Apr 1 11:44 WAP12_MAIN6.dbf
> > -rw-r----- 1 oracle dba 2097160192 Apr 1 11:44 WAP12_MAIN7.dbf
> > -rw-r----- 1 oracle dba 2097160192 Apr 1 11:44 WAP12_MAIN8.dbf
> > -rw-r----- 1 oracle dba 2097160192 Apr 1 11:44 WAP12_MAIN9.dbf
> > -rw-r----- 1 oracle dba 1073750016 Apr 1 11:44 WAP12_MVIEW.dbf
> > -rw-r----- 1 oracle dba 1073750016 Mar 24 17:15 WAP12_TEMP1.dbf
> > .........................
> > .............
> >
发现存在文件UNDOTBS2.dbf
4. mount数据库,检查系统参数
>
>
> >
> bash-2.03$ sqlplus "/ as sysdba"
>
> SQL*Plus: Release 9.2.0.3.0 - Production on 星期四 4月 1 11:46:20 2004
> > Copyright (c) 1982, 2002, Oracle Corporation. All rights reserved.
> 已连接到空闲例程。
>
> SQL>
> SQL>
> SQL> startup mount;
> >
> ORACLE 例程已经启动。
> >
> Total System Global Area 4364148184 bytes
> Fixed Size 736728 bytes
> Variable Size 1845493760 bytes
> Database Buffers 2516582400 bytes
> Redo Buffers 1335296 bytes
> 数据库装载完毕。
> >
> SQL> select name from v$datafile;
> NAME
> --------------------------------------------------------------------------------
> /u01/oradata/gzhs/system01.dbf
> /u01/oradata/gzhs/cwmlite01.dbf
> /u01/oradata/gzhs/drsys01.dbf
> /u01/oradata/gzhs/example01.dbf
> /u01/oradata/gzhs/indx01.dbf
> /u01/oradata/gzhs/odm01.dbf
> /u01/oradata/gzhs/tools01.dbf
> /u01/oradata/gzhs/users01.dbf
> /u01/oradata/gzhs/xdb01.dbf
> .........................
> /u01/oradata/gzhs/UNDOTBS2.dbf
>
> >
> 已选择23行。
> >
> SQL>
> SQL> show parameter undo
> NAME TYPE VALUE
> >
> ------------------------------------ ----------- ------------------------------
> undo_management string AUTO
> undo_retention integer 10800
> undo_suppress_errors boolean FALSE
> > undo_tablespace string UNDOTBS1
> >
> SQL> show parameter spfile
> >
> NAME TYPE VALUE
> >
> ------------------------------------ ----------- ------------------------------
> > spfile string
> >
>
> > .........................
> > .............
> >
发现系统没有使用spfile,而初始化参数设置的undo表空间为UNDOTBS1
5. 检查参数文件
>
>
> >
> >
> bash-2.03$ cd $ORACLE_HOME/dbs
> bash-2.03$ ls
> >
> init.ora initgzhs.ora initgzhs.ora.old orapwgzhs
> initdw.ora initgzhs.ora.hurray lkGZHS snapcf_gzhs.f
> bash-2.03$ vi initgzhs.ora
> >
> "initgzhs.ora" [Incomplete last line] 105 lines, 3087 characters
> >
> ####################################################
> > # Copyright (c) 1991, 2001, 2002 by Oracle Corporation
> > ####################################################
> >
> ###########################################
> > # Archive
> > ###########################################
> > log_archive_dest_1='LOCATION=/u06/oradata/gzhs/arch'
> > log_archive_format=%t_%s.dbf
> log_archive_start=true
> ###########################################
> > # Cache and I/O
> > ###########################################
> > db_block_size=8192
> > db_cache_size=2516582400
> > db_file_multiblock_read_count=16
> > ###########################################
> > # Cursors and Library Cache
> > ###########################################
> > open_cursors=300
> > ......................
> >
> >
> ###########################################
> > # System Managed Undo and Rollback Segments
> > ###########################################
> > undo_management=AUTO
> > undo_retention=10800
> > undo_tablespace=UNDOTBS1
> >
> :q!
> > .............
> >
这个设置是极其可疑的.
怀疑参数文件和实际数据库设置不符.
6. 再次检查alert文件
查找对于UNDO表空间的操作
第一部分,创建数据库时的信息:
>
>
> >
> >
> Sat Feb 7 20:30:12 2004
> CREATE DATABASE gzhs
> MAXINSTANCES 1
> MAXLOGHISTORY 1
> MAXLOGFILES 5
> MAXLOGMEMBERS 3
> MAXDATAFILES 100
> DATAFILE '/u01/oradata/gzhs/system01.dbf' SIZE 500M REUSE AUTOEXTEND ON NEXT 10240K MAXSIZE UNLIMITED
> EXTENT MANAGEMENT LOCAL
> DEFAULT TEMPORARY TABLESPACE TEMP TEMPFILE '/u01/oradata/gzhs/temp01.dbf' SIZE 1000M REUSE AUTOEXTEND
> ON NEXT 250M MAXSIZE UNLIMITED
> UNDO TABLESPACE "UNDOTBS1" DATAFILE '/u01/oradata/gzhs/undotbs01.dbf' SIZE 1000M REUSE AUTOEXTEND ON
> NEXT 100M MAXSIZE UNLIMITED
> > CHARACTER SET ZHS16GBK
> > NATIONAL CHARACTER SET AL16UTF16
> > LOGFILE GROUP 1 ('/u01/oradata/gzhs/redo01.log') SIZE 256M,
> > GROUP 2 ('/u01/oradata/gzhs/redo02.log') SIZE 256M,
> > GROUP 3 ('/u01/oradata/gzhs/redo03.log') SIZE 256M
> > .............
> >
注意,这也是OCP教材上提到的两种创建UNDO表空间的方式之一
第二部分,发现创建UNDOTBS2的记录信息:
>
>
> >
> Wed Mar 24 20:20:58 2004
> /* OracleOEM / CREATE UNDO TABLESPACE "UNDOTBS2"
> DATAFILE '/u01/oradata/gzhs/UNDOTBS2.dbf' SIZE 1024M AUTOEXTEND ON NEXT 100M MAXSIZE UNLIMITED
> Wed Mar 24 20:22:37 2004
> Created Undo Segment _SYSSMU11$
> Created Undo Segment _SYSSMU12$
> Created Undo Segment _SYSSMU13$
> Created Undo Segment _SYSSMU14$
> Created Undo Segment _SYSSMU15$
> Created Undo Segment _SYSSMU16$
> Created Undo Segment _SYSSMU17$
> Created Undo Segment _SYSSMU18$
> Created Undo Segment _SYSSMU19$
> Created Undo Segment _SYSSMU20$
> Completed: / OracleOEM */ CREATE UNDO TABLESPACE "UNDOTBS2"
> > Wed Mar 24 20:24:25 2004
> > Undo Segment 11 Onlined
> > Undo Segment 12 Onlined
> > Undo Segment 13 Onlined
> > Undo Segment 14 Onlined
> > Undo Segment 15 Onlined
> > Undo Segment 16 Onlined
> > Undo Segment 17 Onlined
> > Undo Segment 18 Onlined
> > Undo Segment 19 Onlined
> > Undo Segment 20 Onlined
> > Successfully onlined Undo Tablespace 15.
> > Undo Segment 1 Offlined
> > Undo Segment 2 Offlined
> > Undo Segment 3 Offlined
> > Undo Segment 4 Offlined
> > Undo Segment 5 Offlined
> > Undo Segment 6 Offlined
> > Undo Segment 7 Offlined
> > Undo Segment 8 Offlined
> > Undo Segment 9 Offlined
> > Undo Segment 10 Offlined
> > Undo Tablespace 1 successfully switched out.
> > .............
> >
第三部分,新的UNDO表空间被应用
Wed Mar 24 20:24:25 2004
ALTER SYSTEM SET undo_tablespace='UNDOTBS2' SCOPE=MEMORY;
我们发现问题就在这里,创建了新的UNDO表空间以后,因为使用的是pfile文件,修改的只对当前实例生效,操作人员忘记了修改pfile文件.
如果使用spfile,缺省的修改范围是both,会同时修改spfile文件,就可以避免以上问题的出现.
第四部分,删除了UNDOTBS1的信息
>
> Wed Mar 24 20:25:01 2004
> /* OracleOEM / DROP TABLESPACE "UNDOTBS1" INCLUDING CONTENTS AND DATAFILES CASCADE CONSTRAINTS
> Wed Mar 24 20:25:03 2004
> Deleted file /u01/oradata/gzhs/undotbs01.dbf
> Completed: / OracleOEM */ DROP TABLESPACE "UNDOTBS1" INCLUDI
> > .............
> >
这样再次重新启动数据库的时候,问题出现了,pfile中定义的UNDOTBS1找不到了,而且操作实在很久以前,没人能回忆起来,甚至无法得知是什么人的操作。
7. 更改pfile,启动数据库
修改undo表空间
>
> ###########################################
> # System Managed Undo and Rollback Segments
> ###########################################
> undo_management=AUTO
> undo_retention=10800
> undo_tablespace=UNDOTBS2
>
> ....
>
> bash-2.03$ sqlplus "/ as sysdba"
>
> SQL*Plus: Release 9.2.0.3.0 - Production on 星期四 4月 1 11:55:11 2004
>
> Copyright (c) 1982, 2002, Oracle Corporation. All rights reserved.
>
>
> 连接到:
> Oracle9i Enterprise Edition Release 9.2.0.3.0 - 64bit Production
> With the Partitioning, OLAP and Oracle Data Mining options
> JServer Release 9.2.0.3.0 - Production
>
> SQL> select * from v$version;
>
> BANNER
> ----------------------------------------------------------------
> Oracle9i Enterprise Edition Release 9.2.0.3.0 - 64bit Production
> PL/SQL Release 9.2.0.3.0 - Production
> CORE 9.2.0.3.0 Production
> TNS for Solaris: Version 9.2.0.3.0 - Production
> NLSRTL Version 9.2.0.3.0 - Production
>
> SQL> exit
> 从Oracle9i Enterprise Edition Release 9.2.0.3.0 - 64bit Production
> With the Partitioning, OLAP and Oracle Data Mining options
> JServer Release 9.2.0.3.0 - Production中断开
> bash-2.03$
>
在这里我们可以看到,使用spfile可以免去手工修改pfile文件的麻烦,减少了犯错的可能。
既然Oracle9i给我们提供了这个新特性,就值得我们学习使用它.