jueves, 7 de noviembre de 2013

ASM block corruption with ORA-600

The last week i got some problems with the database because the server was shutdown by an outage.

I tried to startup manually when server was operating again.

SQL> startup
ORACLE instance started.

Total System Global Area  849530880 bytes
Fixed Size                  1339824 bytes
Variable Size             528485968 bytes
Database Buffers          314572800 bytes
Redo Buffers                5132288 bytes
ORA-00205: error in identifying control file, check alert log for more info

so, let's check alertFran.log:

ALTER DATABASE   MOUNT
ORA-00210: cannot open the specified control file
ORA-00202: control file: '+FRA/fran/controlfile/control01.ctl'
ORA-27048: skgfifi: file header information is invalid
ORA-205 signalled during: ALTER DATABASE   MOUNT...

Checker run found 1 new persistent data failures
ORA-205 signalled during: ALTER DATABASE   MOUNT...

controlfile01.ctl is invalid so let's repair them. Two ways:

1) you have multiplixed controlfile:

$export ORACLE_SID=+ASM
$export ORACLE_HOME=/u01/app/grid
$sqlplus / as sysasm
SQL> alter database nomount;
ORACLE instance started.

Total System Global Area  849530880 bytes
Fixed Size                  1339824 bytes
Variable Size             528485968 bytes
Database Buffers          314572800 bytes
Redo Buffers                5132288 bytes
SQL> exit
$asmcmd
ASMCMD> rm -rf '<path_invalid_controlfile>;
ASMCMD> cp '<path_valid_controlfile>' '<path_invalid_controlfile>;

2) You haven't multiplexed controlfile, create manually one:
SQL> CREATE CONTROLFILE
     SET DATABASE Fran
     LOGFILE GROUP 1 ('+FRA/fran/onlinelog/group_1.A',
                      '+FRAN_DATA/fran/onlinelog/group_1.B'),
             GROUP 2 ('+FRA/fran/onlinelog/group_2.A',
                      '+FRAN_DATA/fran/onlinelog/group_2.B'),
             GROUP 3 ('+FRA/fran/onlinelog/group_3.A',
                      '+FRAN_DATA/fran/onlinelog/group_3.B'),
             GROUP 4 ('+FRA/fran/onlinelog/group_4.A'),
                      '+FRAN_DATA/fran/onlinelog/group_4.B')
     RESETLOGS                
     DATAFILE '+FRAN_DATA/fran/datafiles/system01.dbf' SIZE 150M,
              '+FRAN_DATA/fran/datafiles/rbs01.dbs' SIZE 512M,
              '+FRAN_DATA/fran/datafiles/temp01.dbs' SIZE 2G,
              '+FRAN_DATA/fran/datafiles/sysaux01.dbf' SIZE 30M,
              '+FRAN_DATA/fran/datafiles/user01.dbf' SIZE 100M,
              '+FRAN_DATA/fran/datafiles/test01.dbf' SIZE 512,
              '+FRAN_DATA/fran/datafiles/fran01.dbf' SIZE 1G;

**if you modified or create controlfile remember that you must modify the new path at pfile. if you create pfile from spfile, open pfile and you see something like this:
**control_files = ('+FRAN_DATA/fran/controlfile/control01.ctl','+FRA/fran/controlfile/control02.ctl');
**modify manually with vi or textpad (depends OS) changing its paths
**control_files = ('new_path/control01.ctl','new_path/control02.ctl');
**SQL> STARTUP PFILE='path_pfile';
**SQL> create spfile from pfile;

we can now try to open database.

$sqlplus / as sysdba
SQL> ALTER DATABASE OPEN;


**It should open without problem, but like my issue is different i got a new error

ORA-00338: log 3 of thread 2 is more recent than control file
ORA-00312: online log 3 thread 2: '+FRAN_DATA/Fran/onlinelog/group_3.A'
ORA-00338: log 3 of thread 2 is more recent than control file
ORA-00312: online log 3 thread 2: '+FRA/Fran/onlinelog/group_3.B'

the solution was restore/recover database from last backup until 10 minutes before the outage:
RMAN> SHUTDOWN IMMEDIATE;
RMAN> STARTUP MOUNT;
RMAN> RUN{
  ALLOCATE CHANNEL c1 FORMAT '<path_of_backup>';
  set until time "to_date('2013-11-01:11:27:00', 'yyyy-mm-dd:hh24:mi:ss')";
  restore database;
  recover database;
    release channel c1;
 }

after restore recover finished succesfully i got a new error when we opened database (ALTER DATABASE OPEN RESETLOGS)

ORA-600 : internal error code, arguments: [krsi_al_hdr_update.15], [4294967295], [], [], [], [], [], [], [], [], [], []

this error are serious, we searched by this arguments at ORA-600 aplication of MOS (My Oracle Support) but no luck....

searching at alert.log from asm instance we found the error:

ORA-600 : internal error code, arguments: [krsi_al_hdr_update.15], [4294967295], [], [], [], [], [], [], [], [], [], []
ORA-15032: not all alterations performed
ORA-15130: diskgroup "FRA" is being dismounted
ORA-15066: offlining disk "FRA_0004" in group
"FRA" may result in a data loss
ORA-15196: invalid ASM block header [kfc.c:26076] [hard_kfbh] [11223]
[2147483648] [34 != 130]
ORA-15196: invalid ASM block header [kfc.c:26076] [hard_kfbh] [11223]
[2147483648] [34 != 130]

It is an ASM corruption block!! Well, breathe, everything has a solution in that life...


SQL> SELECT count(*) from v$database_block_corruption;

 COUNT(*)
---------
     217    <-- confirmed, 217 blocks corrupted.

solution--> move everything of this asm diskgroup to another asm diskgroup or copy them at OS level. If you have enterprise edition licence you can easily repair with "RMAN> blockrecover corruption list;" (you must need a valid backup, of course), like we have standard edition we haven't any other option....

At this ASM diskgroup we have logfiles, controlfile, backups and archivelogs. We aren't worry about backups and archivelogs because they are copying in other server every 1 hour. So we just moved logfiles and controlfile:

$sqlplus / as sysasm
*REMEMBER: I had database on mount state. So we can do it without any problem.

SQL> ALTER SYSTEM SET control_files = ('+FRAN_DATA/fran/controlfile/control01.ctl','/backup/controlfile/control02.ctl') scope=spfile;

SQL> select member from v$logfile;

MEMBER
--------------------------------------------------------------------------------
+FRA/fran/onlinelog/group_1.A
+FRA/fran/onlinelog/group_2.A
+FRAN_DATA/fran/onlinelog/group_2.B
+FRA/fran/onlinelog/group_3.A
+FRAN_DATA/fran/onlinelog/group_3.B
+FRA/fran/onlinelog/group_4.A
+FRAN_DATA/fran/onlinelog/group_4.B
+FRAN_DATA/fran/onlinelog/group_1.B

SQL> ALTER DATABASE RENAME FILE '+FRA/fran/onlinelog/group_1.A', '+FRAN_DATA/fran/onlinelog/group_1.B' TO '+FRAN_DATA/ASM/ONLINELOG/group_1NEW_A', '+FRAN_DATA/ASM/ONLINELOG/group_1NEW_B';

Database altered.

SQL> ALTER DATABASE RENAME FILE '+FRA/fran/onlinelog/group_2.A', '+FRAN_DATA/fran/onlinelog/group_2.B' TO '+FRAN_DATA/ASM/ONLINELOG/group_2NEW_A', '+FRAN_DATA/ASM/ONLINELOG/group_2NEW_B';

Database altered.

SQL> ALTER DATABASE RENAME FILE '+FRA/fran/onlinelog/group_3.A', '+FRAN_DATA/fran/onlinelog/group_3.B' TO '+FRAN_DATA/ASM/ONLINELOG/group_3NEW_A', '+FRAN_DATA/ASM/ONLINELOG/group_3NEW_B';

Database altered.

SQL> ALTER DATABASE RENAME FILE '+FRA/fran/onlinelog/group_4.A', '+FRAN_DATA/fran/onlinelog/group_4.B' TO '+FRAN_DATA/ASM/ONLINELOG/group_4NEW_A', '+FRAN_DATA/ASM/ONLINELOG/group_4NEW_B';

Database altered.


SQL> select member from v$logfile;

MEMBER
--------------------------------------------------------------------------------
+FRAN_DATA/asm/onlinelog/group_1new_A
+FRAN_DATA/asm/onlinelog/group_2new_A
+FRAN_DATA/asm/onlinelog/group_2new_B
+FRAN_DATA/asm/onlinelog/group_3new_A
+FRAN_DATA/asm/onlinelog/group_3new_B
+FRAN_DATA/asm/onlinelog/group_4new_A
+FRAN_DATA/asm/onlinelog/group_4new_B
+FRAN_DATA/asm/onlinelog/group_1new_B

8 rows selected.

SQL> alter database open resetlogs
2 ;

Database altered.

WOW, opened!! good work!!

Now, i recommend drop and recreate diskgroup +FRA, move again logfiles and controlfile to +FRA and forgot all about corruption blocks.... for the moment....

2 comentarios:

  1. Good job! But what is the reason behind the corruption?

    ResponderEliminar
  2. Hardware failure, power disruption, I/O problems, block header haven't enough time to be updated, or not enough time to row data to be populated..... are the most common causes

    ResponderEliminar