Rebuild Failed Primary Database In Oracle

Database failover is one the critical process that DBA need to perform because in real time when the primary database is gone, The entire application team start approaching DBA and during that time apart from communicating with clients in the outage call DBA also need to perform the failover operation as quickly as possible and once the failover will done then after that the primary task for the DBAs is to reinstate the failed primary database and bring it back as physical standby database in order to run the primary database under dataguard configuration but prior to reinstate failed primary database we’ve to remember below facts:

 

NOTE A: Reinstate can only be possible if the flashback was turned ON in failed primary database during the time of failover operation.


NOTE B: if the flashback was not ON in failed primary database during the time of database failover, then we only have one option to bring back the failed primary database as physical standby is to rebuild the entire database from scratch.

  

Primary Database After Failover

Database That Need To Reinstate

US_PRIM

IND_DR

 

Step 1st:  Start listener in failed primary database server


lsnrctl status listener_usprim

LSNRCTL for Linux: Version 19.0.0.0.0 - Production on 26-JAN-2024 17:51:57

Copyright (c) 1991, 2019, Oracle.  All rights reserved.

Connecting to (DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=oratest)(PORT=1521)(IP=FIRST)))

STATUS of the LISTENER

------------------------

Alias                     listener_usprim

Version                   TNSLSNR for Linux: Version 19.0.0.0.0 - Production

Start Date                26-JAN-2024 17:18:31

Uptime                    0 days 0 hr. 33 min. 26 sec

Trace Level               off

Security                  ON: Local OS Authentication

SNMP                      OFF

Listener Parameter File   /u01/app/oracle/product/19.0.0/dbhome_1/network/admin/listener.ora

Listener Log File         /u01/app/oracle/diag/tnslsnr/oratest/listener_usprim/alert/log.xml

Listening Endpoints Summary...

(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=192.168.1.9)(PORT=1521)))

Services Summary...

Service "USPRIM" has 1 instance(s).

  Instance "USPRIM", status UNKNOWN, has 1 handler(s) for this service...

The command completed successfully


step 2nd: Start failed primary database in mount mode

 

SQL> startup mount

ORACLE instance started.

Total System Global Area 1728050736 bytes

Fixed Size                      9135664 bytes

Variable Size             419430400 bytes

Database Buffers     1291845632 bytes

Redo Buffers                 7639040 bytes

Database mounted.

 

step 3rd: Check if flashback is ON in failed primary database

 

SQL> select name,database_role,open_mode,flashback_on from v$database;

NAME              DATABASE_ROLE    OPEN_MODE                  FLASHBACK_ON

--------- ---------------- -------------------- ------------------

US_PRIM        PRIMARY                             MOUNTED                         YES

 

SQL> select switchover_status from v$database;

 

SWITCHOVER_STATUS

--------------------

NOT ALLOWED

 

NOTE: The switchover status will display as NOT ALLOWED because in the current state the database is in primary mode and there is no physical standby database of it.


Step4th: Retrieve SCN information from new primary database to perform flashback and reinstate failed primary database


NOTE: Prior to start failover operation we must have to Check alert log of new primary database where we can find SCN no. when the standby became primary under the database failover operation "Standby became primary SCN: 2170691"

 

SQL> select name,role,thread#,sequence#,block# from v$dataguard_process;

 

NAME  ROLE                                     THREAD#  SEQUENCE# BLOCK#

----- ------------------------ ---------- ---------- ----------

TT00  gap manager                         0             0             0

TT01  redo transport timer             0             0             0

ARC0  archive local                         0             0             0

ARC1  archive redo                          0             0             0

NET   recovery coordinator            1            29          139 è This column indicate the  sequence# correspond to the SCN when standby became primary

TT02  async ORL multi                   1             2             0

TT03  async ORL single                  1             2             0


SQL>select sequence#,RESETLOGS_CHANGE#,FIRST_CHANGE#,NEXT_CHANGE#,END_OF_REDO,END_OF_REDO_TYPE from v$archived_log where sequence#=29;

 

 SEQUENCE#|RESETLOGS_CHANGE#|FIRST_CHANGE#|NEXT_CHANGE#|END_OF_REDO     |END_OF_RED

----------|-----------------|-------------|------------|-------------- |------

       29|         1920977 | 2170403     |    2170694  |YES             |TERMINAL

 

Step 5th: Login into failed primary database and perform flashback database with the help of SCN when standby became primary as shown in step 4

 

SQL> flashback database to scn 2170691;

Flashback complete.

 

            Alert log of failed primary after flashback opertion

flashback database to scn 2170691

2024-01-26T18:40:04.170181+05:30

Flashback Restore Start

2024-01-26T18:40:05.784562+05:30

Flashback Restore Complete

Flashback Media Recovery Start

max_pdb is 3

2024-01-26T18:40:06.048239+05:30

Parallel Media Recovery started with 2 slaves

Flashback Media Recovery Log

Recovery of Online Redo Log: Thread 1 Group 1 Seq 28 Reading mem 0

  Mem# 0: /u01/app/oracle/oradata/US_PRIM/onlinelog/o1_mf_1_lrj32ptk_.log

  Mem# 1: /u01/app/oracle/fast_recovery_area/US_PRIM/onlinelog/o1_mf_1_lrj33bcq_.log

2024-01-26T18:40:10.455803+05:30

Incomplete Recovery applied until change 2170692 time 01/26/2024 17:38:12

2024-01-26T18:40:10.588145+05:30

Flashback Media Recovery Complete

2024-01-26T18:40:10.978268+05:30

Completed: flashback database to scn 2170691

2024-01-26T18:41:02.175373+05:30

 rfs (PID:6857): Database mount ID mismatch [0x2936bef7:0x2936870c] (691453687:691439372)


NOTE: Only displayed required data in the alert log above for better understanding

 

step 6th: Convert failed primary into physical standby

 

SQL> alter database convert to physical standby;

Database altered.

 

                        Alert log of new physical standby after convert

alter database convert to physical standby

2024-01-26T18:44:00.418583+05:30

ALTER DATABASE CONVERT TO PHYSICAL STANDBY (USPRIM)

2024-01-26T18:44:00.538603+05:30

standby redo logfiles that match the primary database:

ALTER DATABASE ADD STANDBY LOGFILE 'srl1.f' SIZE 209715200;

ALTER DATABASE ADD STANDBY LOGFILE 'srl2.f' SIZE 209715200;

ALTER DATABASE ADD STANDBY LOGFILE 'srl3.f' SIZE 209715200;

ALTER DATABASE ADD STANDBY LOGFILE 'srl4.f' SIZE 209715200;

Offline data file 5 marked as online during convert to standby or switchover to standby.

Restore of backup may be required if the file is not physically accessible.

Offline data file 6 marked as online during convert to standby or switchover to standby.

Restore of backup may be required if the file is not physically accessible.

Offline data file 8 marked as online during convert to standby or switchover to standby.

Restore of backup may be required if the file is not physically accessible.

.... (PID:6280): Database role changed from PRIMARY to PHYSICAL STANDBY [kcvs.c:8842]

.... (PID:6280): RT: Role transition work is not done

Physical Standby Database mounted.

2024-01-26T18:44:00.582599+05:30

CONVERT TO PHYSICAL STANDBY: Complete - Database mounted as physical standby

Completed: alter database convert to physical standby

 

NOTE: Only displayed required data in the alert log above for better understanding

 

step 7th: Validate new physical standby database after reinstate

 

SQL>select name,database_role,open_mode,host_name,instance_name,status,switchover_status,flashback_on from v$database,v$instance;

 

NAME              DATABASE_ROLE    OPEN_MODE                  HOST_NAME                                                                                    INSTANCE_NAME   STATUS       SWITCHOVER_STATUS    FLASHBACK_ON

--------- ---------------- -------------------- -------------------------------------

US_PRIM   PHYSICAL STANDBY MOUNTED              oratest.com                                                                           USPRIM         MOUNTED      RECOVERY NEEDED           YES


SQL> select name,role,thread#,sequence#,block# from v$dataguard_process;

 

NAME  ROLE                                     THREAD#  SEQUENCE# BLOCK#

----- ------------------------ ---------- ---------- ----------

rfs   RFS ping                                    1             2             0

ARC1  archive redo                          0             0             0

ARC0  archive redo                          0             0             0

rfs   RFS async                                  1             2         16174

TT00  gap manager                         0             0             0

 

Step 8th: Start MRP process in new standby database


SQL> alter database recover managed standby database disconnect from session;

Database altered.


Step 9th: Perform some log switch from new primary database


SQL> show parameter db_unique_name

 

NAME                                                    |TYPE         |VALUE

------------------------------------|-----------|------------------------------

db_unique_name                                |string        |IND_DR

 

SQL> alter system switch logfile;

 

System altered.

 

SQL> /

 

System altered.

 

SQL> select name,role,thread#,sequence#,block# from v$dataguard_process;

 

NAME |ROLE                             |   THREAD#| SEQUENCE#|  BLOCK#

-----|------------------------|----------|----------|----------

LGWR |log writer          |     0|           0|           0

TMON |redo transport monitor  |             0|           0|           0

TT00 |gap manager       |     1|           10|          0

TT01 |redo transport timer    |       0|           0|           0

ARC0 |archive local       |     0|           0|           0

ARC1 |archive redo        |     0|           0|           0

ARC2 |archive redo       |     0|           0|           0

ARC3 |archive redo       |     0|           0|           0

NET  |recovery coordinator    |      1|           29|        139

TT02 |async ORL multi            |     1|           10|         47

TT04 |heartbeat redo informer |   0|           0|           0


Step 10th: Validate now whether new physical standby is in sync after reinstate operation

 

SQL> select name,role,thread#,sequence#,block# from v$dataguard_process;

 

NAME  ROLE                                     THREAD#  SEQUENCE# BLOCK#

----- ------------------------ ---------- ---------- ----------

PR02  recovery apply slave            0             0             0

PR01  recovery apply slave             0             0             0

MRP0  managed recovery              0             0             0

rfs   RFS async                                  1            10           92

rfs   RFS archive                               0             0             0

PR00  recovery logmerger              1            10           92

rfs   RFS ping                                    1            10            0

ARC1  archive redo                          0             0             0

ARC0  archive redo                          0             0             0

TT00  gap manager                         0             0             0

 



Post a Comment

Previous Post Next Post