Oracle诊断案例-Spfile案例一则

Oracle诊断案例-Spfile案例一则

link:

http://www.eygle.com/case/spfile.htm

情况说明:
系统:SUN Solaris8
数据库版本:9203
问题描述:工程人员报告,数据库在重新启动时无法正常启动.检查发现UNDO表空间丢失.
问题诊断及解决过程如下:

1. 登陆系统检查alert.log文件

检查alert.log文件是通常是我们诊断数据库问题的第一步

> SunOS 5.8 > > login: root
> Password:
> Last login: Thu Apr 1 11:39:16 from 10.123.7.162
> Sun Microsystems Inc. SunOS 5.8 Generic Patch October 2001
> You have new mail.
> # su - oracle
> bash-2.03$ cd $ORACLE_BASE/admin/*/bdump
> bash-2.03$ vi *.log > > "alert_gzhs.log" 7438 lines, 283262 characters
> Sat Feb 7 20:30:06 2004
> Starting ORACLE instance (normal)
> LICENSE_MAX_SESSION = 0
> LICENSE_SESSIONS_WARNING = 0
> SCN scheme 3
> Using log_archive_dest parameter default value
> LICENSE_MAX_USERS = 0
> SYS auditing is disabled
> Starting up ORACLE RDBMS Version: 9.2.0.3.0.
> System parameters with non-default values:
> processes = 150
> timed_statistics = TRUE
> shared_pool_size = 1157627904
> large_pool_size = 16777216
> java_pool_size = 637534208
> control_files = /u01/oradata/gzhs/control01.ctl,
> /u02/oradata/gzhs/control02.ctl,
> /u03/oradata/gzhs/control03.ctl
> db_block_size = 8192
> db_cache_size = 2516582400
> compatible = 9.2.0.0.0
> log_archive_start = TRUE
> log_archive_dest_1 = LOCATION=/u06/oradata/gzhs/arch
> log_archive_format = %t_%s.dbf
> db_file_multiblock_read_count= 16
> fast_start_mttr_target = 300
> undo_management = AUTO
> undo_tablespace = UNDOTBS1
> undo_retention = 10800
> remote_login_passwordfile= EXCLUSIVE
> db_domain =
> instance_name = gzhs
> dispatchers = (PROTOCOL=TCP) (SERVICE=gzhsXDB)
> job_queue_processes = 10
> hash_join_enabled = TRUE
> background_dump_dest = /oracle/admin/gzhs/bdump
> user_dump_dest = /oracle/admin/gzhs/udump
> core_dump_dest = /oracle/admin/gzhs/cdump
> sort_area_size = 524288
> db_name = gzhs
> open_cursors = 300
> star_transformation_enabled= FALSE
> query_rewrite_enabled = FALSE
> pga_aggregate_target = 838860800
> aq_tm_processes = 1
> PMON started with pid=2
> DBW0 started with pid=3
> LGWR started with pid=4
> CKPT started with pid=5
> SMON started with pid=6
> "alert_gzhs.log" 7438 lines, 283262 characters
> USER: terminating instance due to error 30012
> Instance terminated by USER, pid = 26433
> ORA-1092 signalled during: ALTER DATABASE OPEN...
> Thu Apr 1 11:11:08 2004
> Starting ORACLE instance (normal)
> LICENSE_MAX_SESSION = 0
> LICENSE_SESSIONS_WARNING = 0
> SCN scheme 3
> Using log_archive_dest parameter default value
> LICENSE_MAX_USERS = 0
> SYS auditing is disabled
> Starting up ORACLE RDBMS Version: 9.2.0.3.0.
> System parameters with non-default values:
> processes = 150
> timed_statistics = TRUE
> shared_pool_size = 1157627904
> large_pool_size = 16777216
> java_pool_size = 637534208
> control_files = /u01/oradata/gzhs/control01.ctl, /u02/oradata/gzhs/control02.ctl, /u03/oradata/gzhs/control03.ctl
> db_block_size = 8192
> db_cache_size = 2516582400
> compatible = 9.2.0.0.0
> log_archive_start = TRUE
> log_archive_dest_1 = LOCATION=/u06/oradata/gzhs/arch
> log_archive_format = %t_%s.dbf
> db_file_multiblock_read_count= 16
> fast_start_mttr_target = 300
> undo_management = AUTO
> undo_tablespace = UNDOTBS1
> undo_retention = 10800
> remote_login_passwordfile= EXCLUSIVE
> db_domain =
> instance_name = gzhs
> dispatchers = (PROTOCOL=TCP) (SERVICE=gzhsXDB)
> job_queue_processes = 10
> hash_join_enabled = TRUE
> background_dump_dest = /oracle/admin/gzhs/bdump
> user_dump_dest = /oracle/admin/gzhs/udump
> core_dump_dest = /oracle/admin/gzhs/cdump
> sort_area_size = 524288
> db_name = gzhs
> open_cursors = 300
> star_transformation_enabled= FALSE
> query_rewrite_enabled = FALSE
> pga_aggregate_target = 838860800
> aq_tm_processes = 1
> PMON started with pid=2
> DBW0 started with pid=3
> LGWR started with pid=4
> CKPT started with pid=5
> SMON started with pid=6
> RECO started with pid=7
> CJQ0 started with pid=8
> Thu Apr 1 11:11:13 2004
> starting up 1 shared server(s) ...
> QMN0 started with pid=9
> Thu Apr 1 11:11:13 2004
> starting up 1 dispatcher(s) for network address '(ADDRESS=(PARTIAL=YES)(PROTOCOL=TCP))'...
> ARCH: STARTING ARCH PROCESSES
> ARC0 started with pid=12
> ARC0: Archival started
> ARC1 started with pid=13
> Thu Apr 1 11:11:13 2004
> ARCH: STARTING ARCH PROCESSES COMPLETE
> Thu Apr 1 11:11:13 2004
> ARC0: Thread not mounted
> Thu Apr 1 11:11:13 2004
> ARC1: Archival started
> ARC1: Thread not mounted
> Thu Apr 1 11:11:14 2004
> ALTER DATABASE MOUNT
> Thu Apr 1 11:11:18 2004
> Successful mount of redo thread 1, with mount id 1088380178.
> Thu Apr 1 11:11:18 2004
> Database mounted in Exclusive Mode.
> Completed: ALTER DATABASE MOUNT
> Thu Apr 1 11:11:27 2004
> alter database open
> Thu Apr 1 11:11:27 2004
> Beginning crash recovery of 1 threads
> Thu Apr 1 11:11:27 2004
> Started first pass scan
> Thu Apr 1 11:11:28 2004
> Completed first pass scan
> 1 redo blocks read, 0 data blocks need recovery
> Thu Apr 1 11:11:28 2004
> Started recovery at
> Thread 1: logseq 177, block 2, scn 0.33104793
> Recovery of Online Redo Log: Thread 1 Group 3 Seq 177 Reading mem 0
> Mem# 0 errs 0: /u01/oradata/gzhs/redo03.log
> Thu Apr 1 11:11:28 2004
> Completed redo application
> Thu Apr 1 11:11:28 2004
> Ended recovery at
> Thread 1: logseq 177, block 3, scn 0.33124794
> 0 data blocks read, 0 data blocks written, 1 redo blocks read
> Crash recovery completed successfully
> Thu Apr 1 11:11:28 2004
> LGWR: Primary database is in CLUSTER CONSISTENT mode
> Thread 1 advanced to log sequence 178
> Thread 1 opened at log sequence 178
> Current log# 1 seq# 178 mem# 0: /u01/oradata/gzhs/redo01.log
> Successful open of redo thread 1.
> Thu Apr 1 11:11:28 2004
> ARC0: Evaluating archive log 3 thread 1 sequence 177
> Thu Apr 1 11:11:28 2004
> ARC0: Beginning to archive log 3 thread 1 sequence 177
> Creating archive destination LOG_ARCHIVE_DEST_1: '/u06/oradata/gzhs/arch/1_177.dbf'
> Thu Apr 1 11:11:28 2004
> SMON: enabling cache recovery
> ARC0: Completed archiving log 3 thread 1 sequence 177
> Thu Apr 1 11:11:28 2004
> Errors in file /oracle/admin/gzhs/udump/gzhs_ora_27781.trc:
> ORA-30012: \263\267\317\373\261\355\277\325\274\344 'UNDOTBS1' \262\273\264\346\324\332\273\362\300\340\320\315\262\273\325\375\310\
> 267
> Thu Apr 1 11:11:28 2004
> Error 30012 happened during db open, shutting down database
> USER: terminating instance due to error 30012
> Instance terminated by USER, pid = 27781
> ORA-1092 signalled during: alter database open...
> :q
> > >
> .............
>


在警报日志末尾显示了数据库在Open状态因为错误而异常终止.

2. 尝试重新启动数据库

> >
> > bash-2.03$ sqlplus "/ as sysdba" > > SQL*Plus: Release 9.2.0.3.0 - Production on 星期四 4月 1 11:43:52 2004 > > Copyright (c) 1982, 2002, Oracle Corporation. All rights reserved. > > 已连接到空闲例程。 > > SQL> startup
> ORACLE 例程已经启动。 > > Total System Global Area 4364148184 bytes
> Fixed Size 736728 bytes
> Variable Size 1845493760 bytes
> Database Buffers 2516582400 bytes
> Redo Buffers 1335296 bytes
> 数据库装载完毕。
> ORA-01092: ORACLE 例程终止。强行断开连接 >
>
>
> > .............
> >


工程人员报告的问题重现.

3. 检查数据文件

> >
> > bash-2.03$ cd /u01/ oradata/gzhs
> > bash-2.03$ ls -l
> > total 55702458
> > -rw-r----- 1 oracle dba 1073750016 Apr 1 11:44 UNDOTBS2.dbf
> > -rw-r----- 1 oracle dba 1073750016 Apr 1 11:44 WAP12_BILLINGDETAIL.dbf
> > -rw-r----- 1 oracle dba 1073750016 Apr 1 11:44 WAP12_MAIN.dbf
> > -rw-r----- 1 oracle dba 2097160192 Apr 1 11:44 WAP12_MAIN10.dbf
> > -rw-r----- 1 oracle dba 2097160192 Apr 1 11:44 WAP12_MAIN11.dbf
> > -rw-r----- 1 oracle dba 2097160192 Apr 1 11:44 WAP12_MAIN2.dbf
> > -rw-r----- 1 oracle dba 2097160192 Apr 1 11:44 WAP12_MAIN3.dbf
> > -rw-r----- 1 oracle dba 2097160192 Apr 1 11:44 WAP12_MAIN4.dbf
> > -rw-r----- 1 oracle dba 2097160192 Apr 1 11:44 WAP12_MAIN5.dbf
> > -rw-r----- 1 oracle dba 2097160192 Apr 1 11:44 WAP12_MAIN6.dbf
> > -rw-r----- 1 oracle dba 2097160192 Apr 1 11:44 WAP12_MAIN7.dbf
> > -rw-r----- 1 oracle dba 2097160192 Apr 1 11:44 WAP12_MAIN8.dbf
> > -rw-r----- 1 oracle dba 2097160192 Apr 1 11:44 WAP12_MAIN9.dbf
> > -rw-r----- 1 oracle dba 1073750016 Apr 1 11:44 WAP12_MVIEW.dbf
> > -rw-r----- 1 oracle dba 1073750016 Mar 24 17:15 WAP12_TEMP1.dbf
> > .........................
> > .............
> >


发现存在文件UNDOTBS2.dbf

4. mount数据库,检查系统参数

> >
> > > bash-2.03$ sqlplus "/ as sysdba" >
> SQL*Plus: Release 9.2.0.3.0 - Production on 星期四 4月 1 11:46:20 2004
> > Copyright (c) 1982, 2002, Oracle Corporation. All rights reserved. > 已连接到空闲例程。 >
> SQL> > SQL> > SQL> startup mount;
> > > ORACLE 例程已经启动。
> > > Total System Global Area 4364148184 bytes > Fixed Size 736728 bytes > Variable Size 1845493760 bytes > Database Buffers 2516582400 bytes > Redo Buffers 1335296 bytes > 数据库装载完毕。
> > > SQL> select name from v$datafile; > NAME > -------------------------------------------------------------------------------- > /u01/oradata/gzhs/system01.dbf > /u01/oradata/gzhs/cwmlite01.dbf > /u01/oradata/gzhs/drsys01.dbf > /u01/oradata/gzhs/example01.dbf > /u01/oradata/gzhs/indx01.dbf > /u01/oradata/gzhs/odm01.dbf > /u01/oradata/gzhs/tools01.dbf > /u01/oradata/gzhs/users01.dbf > /u01/oradata/gzhs/xdb01.dbf > ......................... > /u01/oradata/gzhs/UNDOTBS2.dbf >
> > > 已选择23行。
> > > SQL> > SQL> show parameter undo > NAME TYPE VALUE
> > > ------------------------------------ ----------- ------------------------------ > undo_management string AUTO > undo_retention integer 10800 > undo_suppress_errors boolean FALSE
> > undo_tablespace string UNDOTBS1
> > > SQL> show parameter spfile
> > > NAME TYPE VALUE
> > > ------------------------------------ ----------- ------------------------------
> > spfile string
> > >
> > .........................
> > .............
> >


发现系统没有使用spfile,而初始化参数设置的undo表空间为UNDOTBS1

5. 检查参数文件

> >
> >
> > > bash-2.03$ cd $ORACLE_HOME/dbs > bash-2.03$ ls
> > > init.ora initgzhs.ora initgzhs.ora.old orapwgzhs > initdw.ora initgzhs.ora.hurray lkGZHS snapcf_gzhs.f > bash-2.03$ vi initgzhs.ora
> > > "initgzhs.ora" [Incomplete last line] 105 lines, 3087 characters
> > > ####################################################
> > # Copyright (c) 1991, 2001, 2002 by Oracle Corporation
> > ####################################################
> > > ###########################################
> > # Archive
> > ###########################################
> > log_archive_dest_1='LOCATION=/u06/oradata/gzhs/arch'
> > log_archive_format=%t_%s.dbf > log_archive_start=true > ###########################################
> > # Cache and I/O
> > ###########################################
> > db_block_size=8192
> > db_cache_size=2516582400
> > db_file_multiblock_read_count=16
> > ###########################################
> > # Cursors and Library Cache
> > ###########################################
> > open_cursors=300
> > ......................
> >
> > > ###########################################
> > # System Managed Undo and Rollback Segments
> > ###########################################
> > undo_management=AUTO
> > undo_retention=10800
> > undo_tablespace=UNDOTBS1
> > > :q!
> > .............
> >


这个设置是极其可疑的.
怀疑参数文件和实际数据库设置不符.

6. 再次检查alert文件
查找对于UNDO表空间的操作

第一部分,创建数据库时的信息:

> >
> >
> > > Sat Feb 7 20:30:12 2004 > CREATE DATABASE gzhs > MAXINSTANCES 1 > MAXLOGHISTORY 1 > MAXLOGFILES 5 > MAXLOGMEMBERS 3 > MAXDATAFILES 100 > DATAFILE '/u01/oradata/gzhs/system01.dbf' SIZE 500M REUSE AUTOEXTEND ON NEXT 10240K MAXSIZE UNLIMITED > EXTENT MANAGEMENT LOCAL > DEFAULT TEMPORARY TABLESPACE TEMP TEMPFILE '/u01/oradata/gzhs/temp01.dbf' SIZE 1000M REUSE AUTOEXTEND > ON NEXT 250M MAXSIZE UNLIMITED > UNDO TABLESPACE "UNDOTBS1" DATAFILE '/u01/oradata/gzhs/undotbs01.dbf' SIZE 1000M REUSE AUTOEXTEND ON > NEXT 100M MAXSIZE UNLIMITED
> > CHARACTER SET ZHS16GBK
> > NATIONAL CHARACTER SET AL16UTF16
> > LOGFILE GROUP 1 ('/u01/oradata/gzhs/redo01.log') SIZE 256M,
> > GROUP 2 ('/u01/oradata/gzhs/redo02.log') SIZE 256M,
> > GROUP 3 ('/u01/oradata/gzhs/redo03.log') SIZE 256M
> > .............
> >


注意,这也是OCP教材上提到的两种创建UNDO表空间的方式之一

第二部分,发现创建UNDOTBS2的记录信息:

> >
> > > Wed Mar 24 20:20:58 2004 > /* OracleOEM / CREATE UNDO TABLESPACE "UNDOTBS2" > DATAFILE '/u01/oradata/gzhs/UNDOTBS2.dbf' SIZE 1024M AUTOEXTEND ON NEXT 100M MAXSIZE UNLIMITED > Wed Mar 24 20:22:37 2004 > Created Undo Segment _SYSSMU11$ > Created Undo Segment _SYSSMU12$ > Created Undo Segment _SYSSMU13$ > Created Undo Segment _SYSSMU14$ > Created Undo Segment _SYSSMU15$ > Created Undo Segment _SYSSMU16$ > Created Undo Segment _SYSSMU17$ > Created Undo Segment _SYSSMU18$ > Created Undo Segment _SYSSMU19$ > Created Undo Segment _SYSSMU20$ > Completed: / OracleOEM */ CREATE UNDO TABLESPACE "UNDOTBS2"
> > Wed Mar 24 20:24:25 2004
> > Undo Segment 11 Onlined
> > Undo Segment 12 Onlined
> > Undo Segment 13 Onlined
> > Undo Segment 14 Onlined
> > Undo Segment 15 Onlined
> > Undo Segment 16 Onlined
> > Undo Segment 17 Onlined
> > Undo Segment 18 Onlined
> > Undo Segment 19 Onlined
> > Undo Segment 20 Onlined
> > Successfully onlined Undo Tablespace 15.
> > Undo Segment 1 Offlined
> > Undo Segment 2 Offlined
> > Undo Segment 3 Offlined
> > Undo Segment 4 Offlined
> > Undo Segment 5 Offlined
> > Undo Segment 6 Offlined
> > Undo Segment 7 Offlined
> > Undo Segment 8 Offlined
> > Undo Segment 9 Offlined
> > Undo Segment 10 Offlined
> > Undo Tablespace 1 successfully switched out.
> > .............
> >


第三部分,新的UNDO表空间被应用

Wed Mar 24 20:24:25 2004
ALTER SYSTEM SET undo_tablespace='UNDOTBS2' SCOPE=MEMORY;

我们发现问题就在这里,创建了新的UNDO表空间以后,因为使用的是pfile文件,修改的只对当前实例生效,操作人员忘记了修改pfile文件.

如果使用spfile,缺省的修改范围是both,会同时修改spfile文件,就可以避免以上问题的出现.

第四部分,删除了UNDOTBS1的信息

> > Wed Mar 24 20:25:01 2004 > /* OracleOEM / DROP TABLESPACE "UNDOTBS1" INCLUDING CONTENTS AND DATAFILES CASCADE CONSTRAINTS > Wed Mar 24 20:25:03 2004 > Deleted file /u01/oradata/gzhs/undotbs01.dbf > Completed: / OracleOEM */ DROP TABLESPACE "UNDOTBS1" INCLUDI
> > .............
> >


这样再次重新启动数据库的时候,问题出现了,pfile中定义的UNDOTBS1找不到了,而且操作实在很久以前,没人能回忆起来,甚至无法得知是什么人的操作。

7. 更改pfile,启动数据库

修改undo表空间

>
> ###########################################
> # System Managed Undo and Rollback Segments
> ###########################################
> undo_management=AUTO
> undo_retention=10800
> undo_tablespace=UNDOTBS2
>
> ....
>
> bash-2.03$ sqlplus "/ as sysdba"
>
> SQL*Plus: Release 9.2.0.3.0 - Production on 星期四 4月 1 11:55:11 2004
>
> Copyright (c) 1982, 2002, Oracle Corporation. All rights reserved.
>
>
> 连接到:
> Oracle9i Enterprise Edition Release 9.2.0.3.0 - 64bit Production
> With the Partitioning, OLAP and Oracle Data Mining options
> JServer Release 9.2.0.3.0 - Production
>
> SQL> select * from v$version;
>
> BANNER
> ----------------------------------------------------------------
> Oracle9i Enterprise Edition Release 9.2.0.3.0 - 64bit Production
> PL/SQL Release 9.2.0.3.0 - Production
> CORE 9.2.0.3.0 Production
> TNS for Solaris: Version 9.2.0.3.0 - Production
> NLSRTL Version 9.2.0.3.0 - Production
>
> SQL> exit
> 从Oracle9i Enterprise Edition Release 9.2.0.3.0 - 64bit Production
> With the Partitioning, OLAP and Oracle Data Mining options
> JServer Release 9.2.0.3.0 - Production中断开
> bash-2.03$
>


在这里我们可以看到,使用spfile可以免去手工修改pfile文件的麻烦,减少了犯错的可能。

既然Oracle9i给我们提供了这个新特性,就值得我们学习使用它.

Published At
Categories with 数据库类
Tagged with
comments powered by Disqus