Thursday , August 18 2022

Failover A Physical Standby Database With SQLPLUS Commands

In today’s article, we will learn to Failover A Physical Standby Database With SQLPLUS Commands.

In a system with a Data Guard Environment consisting of a dual-node Primary Database, a dual-node Physical Standby Database and a dual-node Logical Standby Database in a RAC structure, we perform a Failover operation to the Physical Standby Database.

We prefer Flashback because it is fast as a method of converting to Physical Standby Database, which will become DISABLE after failover.

This process will consist of 5 steps.

A. Preliminary Preparation
B. Failover to Physical Standby
C. Checks After Failover
C. Physical Standby of Old Primary with Flashback
D. Final Checks

A. PRELIMINARY PREPARATION

1. We check the parameters about whether Flashback is turned on in the databases and how long the Flashback logs will be stored.

2. We activate flashback.

3. We check the flashback status.

4. We check whether flashback logs are created.

5. We start Redo Apply operations on Standby Databases.

6. We check if there is Redo and Transport LAG after recovery.

7. After Flashback and Redo Apply, we check the accuracy of everything with DDL & DML processes.

This error is received because the Physical Standby database is in Mount mode. If we want to see the results here, then Active Data Guard license is obtained and the database is OPEN.

B. FAILOVER TO PHYSICAL STANDBY DATABASE

We should try to fix the problem by making the necessary interventions during this time, perhaps taking into account some interruptions in the Primary Database, if not, we should consider the Failover process as a last resort.

Because after Failover, our Primary Database will be DISABLE. If we do not have a third standby database in our Data Guard Environment until we prepare it as Physical Standby, we will continue with a single system.

Now let’s see how the Failover process works.

1. Let’s assume that we can mount the Primary Database due to the problem, but it does not come to OPEN mode.

In this case, if we are not using the Maximum Protection mode, the first thing to do is to send archives and current redos from Primary to Physical Standby, if any, in order to avoid data loss.

If I succeed in this, it means that I will perform a Failover operation with zero data loss, which is a great thing.

For this, let’s set up our test environment first.

a. I want to break the sync of the standby side with the Primary. For this reason, I prevent certain archives from going there.

b. We learn the current archive redo log numbers of the databases in the Data Guard Environment.

The reason we put a WHERE condition here is because RESETLOGS is done in the database for previous tests.

Because of this operation, I find the most up-to-date RESETLOGS_TIME from VIEW and write that value to the query, since the queue is back to the beginning.

c. With the Log Switch operation, I check whether the archives are going to standby sides.

Although LOG_ARCHIVE_DEST_STATE_2 is DEFERed, the reason why sequence# increases is because the logs accumulated until DEFER must be sent.

DEFER will not send the next logs instantly.

ç. Before the tables to be created for test purposes, the tables belonging to the user are queried to see if the Redos are Transport and Apply.

d. We create a new table.

e.We check the archive numbers by performing the Log Switch operation.

f. We create another table.

g. We check the archive numbers by performing the Log Switch operation.

ğ. We create another table

h. We are doing Log Switching.

I. I assume that the Primary does not switch to OPEN mode at the time of the disaster, and I put the Primary Database in MOUNT mode.

i.I find the highest sequence# in each thread in Physical Standby and understand which archives are missing on the Standby side.

j. Before moving on to failover actions, question the status and role of the Databases in the environment one last time.

2. The job of installing the Test Environment is finished here. Now I am doing the job of sending the missing archive logs and CURRENT REDOs from the Primary on the Standby side.

a. We send Current Redos and Archive Logs to the Standby side.

b. This error is received when all instances are open in RAC structure. That’s why I’m closing Node 2.

c. I try again to post archive and current redos.

ç. The reason for this error is that the LOG_ARCHIVE_DEST_STATE_2 parameter, which will send LOGs to the Target it will fail, is deferred.

D. The reason for this error is that it shows that Redo-Transport is done but Redo-Apply is not. Below are the logs falling into ALERT_LOG.

[Primary-1]—–>ALERT LOG
[Standby-1]—–>ALERT LOG
[Standby-2]—–>ALERT LOG
e. We are launching Redo-Apply.
[Standby-1]—–>ALERT LOG
[Standby-2]—–>ALERT LOG
f. We check whether the tables created after LOG_ARCHIVE_DEST_STATE_2 DEFER.
3. Suppose we could not put the Primary Database in MOUNT mode or the ALTER SYSTEM FLUSH REDO command failed.

In this case, I can minimize data loss by querying the largest SEQUENCE# in each THREAD in the Primary and Physical Standby Database and copying the missing archives to the Physical Standby Database at the operating system level and REGISTER. To test this,

I first set up a test environment.

a. I want to break the sync of the standby side with the Primary. For this reason, I prevent certain archives from going there.

b. We are querying the user’s existing tables.

c. We are doing Log Switching.

ç. We are querying the existence of the tables by dropping a table.

d. We are doing Log Switching.

e. We are deleting another table.

f. We are doing another Log Switch operation.

4. We are starting to work on sending the archive logs to the other Standby side.

a. We create folders in which archives will be saved. These folders are created in the sharing area so that both databases can access them.

b. We copy the archives missing from ASM on the primary side to the folder created in the sharing area.

[Primary-1]

 

c. On the standby side, we copy the archive logs to the relevant folder in ASM.

[Standby-1] c. We REGISTER to APPLY the archives.
[Standby-1]—–>ALERT LOG
d. We are checking the Archive Log Sequence#s.
e. After LOG_ARCHIVE_DEST_2 is DEFERED to set up the Test Environment, we check whether the deleted tables are also deleted from the Physical Standby.
5. Now that I have prevented data loss, I can start the preparations for the Switchover process. First, we stop the RECOVER operation in the Physical Standby Database.
6. We finish the RECOVER process in the Physical Standby Database.
[Standby-1]—–>ALERT LOG
If an error is received as a result of this operation, it means that there is a GAP and this GAP cannot be closed. As long as the GAP cannot be closed, there is data loss.

If the GAP cannot be closed despite the operations, then the following command is executed considering the data loss.

7. If there is no need for the ACTIVATE PHYSICAL STANDBY command, we now query the status of the Physical Standby Database for Switchover.

8. We do Physical Standby Database PRIMARY with Switchover process.

[Standby-1]—–>ALERT LOG

9. We open the Physical Standby Databases.

10. When turning the old Physical Standby Database into OPEN mode for Primary Mode, errors are received about the old Primary Database as can be seen from the logs.

Therefore, the old Primary Database should be closed and the log flow should be stopped.

With this process, the following error in the logs disappeared.

[Standby-1]—–>ALERT LOG

C. CHECKS AFTER FAILOVER

1. We are querying the Roles, Recovery Modes and Apply Modes of the databases.

2. We start Redo Apply in Logical Standby Database.

Although Redo Apply is started, IDLE appears again when the recovery mode is queried. Then we check from ALERT LOGS. It is seen that the following error is entered in the logs.

[Logical-1]—–>ALERT LOG
It is clear from the error that LogMiner is trying to process an SCN much ahead of the time it is RESETLOGS, but Redo Apply must start from an earlier moment to avoid data inconsistency.

Therefore, it is necessary to wrap the Logical Standby Database just before the SCN with RESETLOGS_CHANGE.

3. Although it is seen that there is no problem in Redo Transport, this is still seen with the Log Switch operation.

[Standby-1]—–>ALERT LOG

[Logical-2]—–>ALERT LOG

[Standby-2]—–>ALERT LOG

4. After the Log Switch operation, we check whether the Redos go to Logical Standby.

As it can be understood from here, REDOs are going but cannot be processed.

5. We learn the SCN just before RESETLOGS.

6. We learn the current SCN of the Logical Standby Database.

7. Logical Standby is wrapped with Database Flashback just before RESETLOGS (SCN in item 5).

8. We put the Database MOUNT mode.

9. Again with the Logical Standby Database Flashback, it is wrapped just before the RESETLOGS (SCN in the 5th item).

10. We issue the command as follows.

11. We query the relevant VIEW to go back to the earliest Flashback Logs and learn this information.

12. We are flashbacking the database with the oldest SCN.

13. We open the database with RESETLOGS.

14. We query Recovery Mode and Apply LAG.

Apply Lag, which was 8 hours after the problem, was reduced to 48 minutes.

15. We conclude the test by creating a table.

16. It is also seen that the logs are processed by performing the Log Switch operation.

17. We see that all logs are processed, not just the current logs.

Ç. BECOME A PHYSICAL STANDBY OF OLD PRIMARY WITH FLASHBACK

After failover, my Primary Database was DISABLE. Therefore, in order to use this Database as Physical Standby again,

I will either take Standby Backup from the new Primary Database and set up a new Physical Standby Database here, or if I have Flashback Logs before Failover, I will quickly do Physical Standby with Flaschback.

As we have done both long and before, Backup management will not be tried and Flashback feature will be used in this scenario.

1. We learn when Physical Standby becomes Primary from the new Primary Database.

2. We mount the old Primary Database.

3. The old Primary Database is wrapped in the SCN where the Physical Standby is Primary with Flashback.

[Primary-1]—–>ALERT LOG

4. The old Primary Database is converted to Physical Standby.

[Primary-1]—–>ALERT LOG

5. We question the status of the new Physical Standby.

6. We open the new Physical Standby in MOUNT mode.

7. The Role and Status of the New Physical Standby is questioned.

8. LOG_ARCHIVE_DEST_STATE_2 is ENABLED so that the archive logs can go to the new Physical Standby.

9. We start RECOVERY in the new Physical Standby.

D. FINAL CHECKS

After failover, we check the healthy functioning of Redo Transport and Redo Apply after the old Primary database is Physical Standby and the old Physical Standby is Primary.

1. Log Switch operation is performed on both 2 Primary Nodes.

2. We check the sequence# of the archive logs in the databases.

Now we write a new RESETLOGS_TIME for the WHERE condition in the queries.

3. We check for lag.

4. Finally, we check whether Real-Time Apply is active by DROPing the existing table. For this, we first query the existing tables belonging to the user.

Query could not be performed because the database is in MOUNT mode. If there is an ACTIVE DATA GUARD license, query can be made by pulling into OPEN mode.

5. The table is DROPed and queried from the standbys.