In today’s article, I will tell you about Data Guard Broker – Monitoring The Environment.
After installing the Data Guard Environment and enabling the Broker Configuration, we need to check the operability of the system. Below are the DGMGRL commands to make these checks.
[Commands run from DGMGRL are all run from Primary-1 by connecting to DGMGRL]1. Our first check query is to query the entire configuration.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | DGMGRL> show configuration Configuration - Broker_Configuration Protection Mode: MaxPerformance Databases: primary - Primary database standby - Physical standby database logical - Logical standby database Fast-Start Failover: DISABLED Configuration Status: SUCCESS |
From here;
Protection Mode,
Which DBs are included in the Broker Environment,
The status of Fast-Start-Failover,
We can see the general status of the configuration.
2. General checks can be made at the database level.
1 2 3 4 5 6 7 8 9 10 11 12 | DGMGRL> show database primary; Database - primary Role: PRIMARY Intended State: TRANSPORT-ON Instance(s): primary1 primary2 Database Status: SUCCESS |
From here;
We learn the status of Redo Transport services.
We learn the status of the database.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | DGMGRL> show database standby; Database - standby Role: PHYSICAL STANDBY Intended State: APPLY-ON Transport Lag: 0 seconds (computed 2 seconds ago) Apply Lag: 0 seconds (computed 2 seconds ago) Apply Rate: 517.00 KByte/s Real Time Query: ON Instance(s): primary1 (apply instance) primary2 Database Status: SUCCESS |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | DGMGRL> show database logical; Database - logical Role: LOGICAL STANDBY Intended State: APPLY-ON Transport Lag: 0 seconds (computed 1 second ago) Apply Lag: 4 seconds (computed 1 second ago) Apply Rate: 862.71 MByte/s Instance(s): primary1v primary2 (apply instance) Database Status: SUCCESS |
From here;
The role of databases,
Whether or not Redo-Apply is active,
Redo and Transport Lag information,
Whether Real-Time-Query (Active Data Guard) is active for the Physical Standby Database,
On which instances the Apply Services are running,
We learn information about the status of databases.
3. If Database or Configuration status is ERROR, we create an error condition to see the parameters to be checked.
While the Broker Configuration is ENABLE, we are changing the parameters from SQLPLUS.
1 2 3 4 5 6 7 8 9 10 | [Primary-2] SQL> show parameter log_archive_dest_state_3 NAME TYPE VALUE ------------------------------------ ----------- ------------------------------ log_archive_dest_state_3 string ENABLE log_archive_dest_state_30 string enable log_archive_dest_state_31 string enable [Primary-2] SQL> alter system set log_archive_dest_state_2=DEFER scope=both sid='*'; System altered. |
We are querying the Broker Configuration.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | DGMGRL> show configuration Configuration - Broker_Configuration Protection Mode: MaxPerformance Databases: primary - Primary database Error: ORA-16764: redo transport service to a standby database is not running standby - Physical standby database logical - Logical standby database Fast-Start Failover: DISABLED Configuration Status: ERROR |
4. The error may not always be obvious as above. Below are the parameters we will look at in such cases.
We’re looking at the broker’s logs.
1 2 3 4 | [oracle@primary2 ~]$ tail -100f /u01/app/oracle/diag/rdbms/primary/primary2/trace/drcprimary2.log 01/16/2017 10:12:02 Redo transport problem detected: redo transport to database 'standby' is unexpectedly offline: DEFERRED |
Note: It appears from the Error that the LOG_ARCHIVE_DEST_STATE_2 parameter is DEFERred.
We can see the error with the “StatusReport” property without looking at the log.
1 2 3 4 5 | DGMGRL> show database primary 'StatusReport'; STATUS REPORT INSTANCE_NAME SEVERITY ERROR_TEXT primary1 ERROR ORA-16738: redo transport service for database "standby" is not running primary2 ERROR ORA-16738: redo transport service for database "standby" is not running |
We also question other properties related to transport services.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | DGMGRL> show database primary 'LogXptStatus'; LOG TRANSPORT STATUS PRIMARY_INSTANCE_NAME STANDBY_DATABASE_NAME STATUS primary1 standby primary1 logical primary2 standby primary2 logical DGMGRL> show database primary 'InconsistentProperties'; INCONSISTENT PROPERTIES INSTANCE_NAME PROPERTY_NAME MEMORY_VALUE SPFILE_VALUE BROKER_VALUE DGMGRL> Show database primary 'InconsistentLogXptProps'; INCONSISTENT LOG TRANSPORT PROPERTIES INSTANCE_NAME STANDBY_NAME PROPERTY_NAME MEMORY_VALUE BROKER_VALUE |
LogXptStatus: Shows Transport Errors caught by all instances in the Primary Database.
InconsistentProperties: Lists if there is an inconsistent situation in the broker configuration and database settings.
InconsistentLogXptProps: Lists the inconsistencies between the Broker configuration and Redo Transport settings of parameters related to Redo Transport in Standby Databases.
5. Suppose there are many log switch operations until the error is noticed. In this case, we can see which archive files are not processed as follows.
We are doing the Log Switch operation for testing.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | [Primary-2] SQL> alter system switch logfile; System altered. [Primary-2] SQL> alter system switch logfile; System altered. [Primary-2] SQL> alter system switch logfile; System altered. [Primary-2] SQL> alter system switch logfile; System altered. |
We check the Sequence numbers on the Primary and Standby sides.
1 2 3 4 5 6 | [Primary-2] SQL> select max(sequence#),thread# from v$archived_log group by thread#; MAX(SEQUENCE#) THREAD# -------------- ---------- 572 1 332 2 |
1 2 3 4 5 6 | [Physical-2] SQL> select max(sequence#),thread# from v$archived_log group by thread#; MAX(SEQUENCE#) THREAD# -------------- ---------- 572 1 329 2 |
1 2 3 4 5 6 | [Logical-2] SQL> SELECT max(sequence#),thread#,applied FROM dba_logstdby_log group by thread#, applied; MAX(SEQUENCE#) THREAD# APPLIED -------------- ---------- -------- 572 1 YES 332 2 YES |
It is seen that some archives do not go to the Physical Standby side. These archives are seen by querying the “SendQEntries” property.
1 2 3 4 5 6 7 8 | DGMGRL> show database primary 'SendQEntries'; PRIMARY_SEND_QUEUE STANDBY_NAME STATUS RESETLOGS_ID THREAD LOG_SEQ TIME_GENERATED TIME_COMPLETED FIRST_CHANGE# NEXT_CHANGE# SIZE (KBs) standby ARCHIVED 932222810 2 330 01/16/2017 10:55:27 01/16/2017 10:55:35 6074895 6074922 8 standby ARCHIVED 932222810 2 331 01/16/2017 10:55:35 01/16/2017 10:55:37 6074922 6074929 1 standby ARCHIVED 932222810 2 332 01/16/2017 10:55:37 01/16/2017 10:55:40 6074929 6074940 1 CURRENT 932222810 1 573 01/16/2017 10:55:34 6074936 2515 CURRENT 932222810 2 333 01/16/2017 10:55:40 6074940 2967 |
In the STATUS column here,
CURRENT: Shows the Redo Log where currently writing is done.
ARCHIVED: Indicates that online Redo Log’s writing process is completed, local archive is created but archive is not sent to standby side.
NOT_ARCHIVED: Indicates that the online Redo Log’s writing process has been completed, but the local archive has not yet been retrieved.
We ENABLE the DEFERRED LOG_ARCHIVE_DEST_STATE_2 parameter and query the status of the configuration.
1 2 3 | [Primary-1] SQL> alter system set log_archive_dest_state_2=enable scope=both sid='*'; System altered. |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | DGMGRL> show configuration Configuration - Broker_Configuration Protection Mode: MaxPerformance Databases: primary - Primary database standby - Physical standby database logical - Logical standby database Fast-Start Failover: DISABLED Configuration Status: SUCCESS |
1 2 3 4 5 6 | [Primary-2] SQL> select max(sequence#),thread# from v$archived_log group by thread#; MAX(SEQUENCE#) THREAD# -------------- ---------- 573 1 333 2 |
1 2 3 4 5 6 | [Physical-2] SQL> select max(sequence#),thread# from v$archived_log group by thread#; MAX(SEQUENCE#) THREAD# -------------- ---------- 573 1 333 2 |
1 2 3 4 5 6 | [Logical-2] SQL> SELECT max(sequence#),thread#,applied FROM dba_logstdby_log group by thread#, applied; MAX(SEQUENCE#) THREAD# APPLIED -------------- ---------- -------- 573 1 YES 333 2 YES |
1 2 3 4 5 | DGMGRL> show database primary 'SendQEntries'; PRIMARY_SEND_QUEUE STANDBY_NAME STATUS RESETLOGS_ID THREAD LOG_SEQ TIME_GENERATED TIME_COMPLETED FIRST_CHANGE# NEXT_CHANGE# SIZE (KBs) CURRENT 932222810 1 574 01/16/2017 11:21:41 6080641 36 CURRENT 932222810 2 334 01/16/2017 11:21:45 6080637 32 |
6. We first set up a test environment to learn about the property, where we can see which archives that come to the standby side are not APPLY. Then the relevant property is queried.
On the logic side, we question the status of Redo -Apply.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | DGMGRL> show database logical; Database - logical Role: LOGICAL STANDBY Intended State: APPLY-ON Transport Lag: 0 seconds (computed 1 second ago) Apply Lag: 0 seconds (computed 1 second ago) Apply Rate: -939900928 Byte/s Instance(s): primary1 primary2 (apply instance) Database Status: SUCCESS |
For double control, a table is created on the Primary side and it is seen whether it goes to the Standby sides.
1 2 3 4 5 6 7 | [Primary-2] SQL> select table_name from dba_tables where owner='TEST'; TABLE_NAME ------------------------------ EMPLOYEES_YEDEK LOCATIONS_YEDEK REGIONS_YEDEK |
1 2 3 4 5 6 7 | [Physical-2] SQL> select table_name from dba_tables where owner='TEST'; TABLE_NAME ------------------------------ EMPLOYEES_YEDEK LOCATIONS_YEDEK REGIONS_YEDEK |
1 2 3 4 5 6 7 | [Logical-2] SQL> select table_name from dba_tables where owner='TEST'; TABLE_NAME ------------------------------ REGIONS_YEDEK EMPLOYEES_YEDEK LOCATIONS_YEDEK |
1 2 3 4 5 6 7 8 9 10 11 12 | [Primary-2] SQL> create table test.departments_yedek as select * from hr.departments; Table created. [Primary-2] SQL> select table_name from dba_tables where owner='TEST'; TABLE_NAME ------------------------------ EMPLOYEES_YEDEK LOCATIONS_YEDEK REGIONS_YEDEK DEPARTMENTS_YEDEK |
1 2 3 4 5 6 7 8 | [Physical-2] SQL> select table_name from dba_tables where owner='TEST'; TABLE_NAME ------------------------------ EMPLOYEES_YEDEK LOCATIONS_YEDEK REGIONS_YEDEK DEPARTMENTS_YEDEK |
1 2 3 4 5 6 7 8 | [Logical-2] SQL> select table_name from dba_tables where owner='TEST'; TABLE_NAME ------------------------------ REGIONS_YEDEK EMPLOYEES_YEDEK LOCATIONS_YEDEK DEPARTMENTS_YEDEK |
On the logic side, we stop the Redo-Apply service.
1 2 | DGMGRL> edit database logical set state='APPLY-OFF'; Succeeded. |
[Logical-1]—>ALERT LOG
1 2 3 4 | 01/16/2017 12:16:53 EDIT DATABASE logical SET STATE = APPLY-OFF 01/16/2017 12:16:55 Command EDIT DATABASE logical SET STATE = APPLY-OFF completed |
We are performing the Log Switch process so that archive logs can be created.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | [Primary-2] SQL> alter system switch logfile; System altered. [Primary-2] SQL> alter system switch logfile; System altered. [Primary-2] SQL> alter system switch logfile; System altered. [Primary-2] SQL> alter system switch logfile; System altered. |
We are querying the Redo Apply status of the Logical Database.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | DGMGRL> show database logical Database - logical Role: LOGICAL STANDBY Intended State: APPLY-OFF Transport Lag: 0 seconds (computed 1 second ago) Apply Lag: 39 seconds (computed 1 second ago) Apply Rate: (unknown) Instance(s): primary1 primary2 (apply instance) Database Status: SUCCESS |
We learn which archives are not applied.
1 2 3 4 5 6 7 8 | DGMGRL> show database logical 'RecvQEntries'; STANDBY_RECEIVE_QUEUE STATUS RESETLOGS_ID THREAD LOG_SEQ TIME_GENERATED TIME_COMPLETED FIRST_CHANGE# NEXT_CHANGE# SIZE (KBs) PARTIALLY_APPLIED 932222810 1 575 01/16/2017 12:08:19 01/16/2017 12:19:01 6089898 6091869 420 PARTIALLY_APPLIED 932222810 2 336 01/16/2017 12:08:22 01/16/2017 12:18:55 6089892 6091799 900 NOT_APPLIED 932222810 2 337 01/16/2017 12:18:55 01/16/2017 12:19:02 6091799 6091851 32 NOT_APPLIED 932222810 2 338 01/16/2017 12:19:02 01/16/2017 12:19:05 6091851 6091863 3 NOT_APPLIED 932222810 2 339 01/16/2017 12:19:05 01/16/2017 12:19:09 6091863 6091873 3 |
The SQLPLUS command of the above DGMGRL query is below.
1 2 3 4 5 6 7 8 9 | [Logical-2] SQL> SELECT max(sequence#),thread#,applied FROM dba_logstdby_log group by thread#, applied; MAX(SEQUENCE#) THREAD# APPLIED -------------- ---------- -------- 575 1 CURRENT 574 1 YES 339 2 NO 335 2 YES 336 2 CURRENT |
7. We learn the Transaction that causes Log Apply services to stop in Logical Standby Database as follows.
1 2 | DGMGRL> show database logical 'LsbyFailedTxnInfo'; LsbyFailedTxnInfo = '' |
8. In the Logical Standby Database, the processes used by Log Apply services and the MAX_SGA information are learned as follows.
1 2 | DGMGRL> show database logical 'LsbyParameters'; LsbyParameters = 'MAX_SGA=30###MAX_SERVERS=14###MAX_EVENTS_RECORDED=10000###PRESERVE_COMMIT_ORDER=TRUE###RECORD_SKIP_ERRORS=Y###RECORD_SKIP_DDL=Y###RECORD_APPLIED_DDL=N' |
9. We see TopWaitEvents as follows.
1 2 3 4 5 6 7 8 | DGMGRL> show instance primary1 'TopWaitEvents' on database primary; TOP SYSTEM WAIT EVENTS Event Wait Time rdbms ipc message 1503759311 class slave wait 297257530 SQL*Net message from client 148557969 gcs remote message 84543254 DIAG idle wait 84519976 |
1 2 3 4 5 6 7 8 | DGMGRL> show instance primary1 'TopWaitEvents' on database standby; TOP SYSTEM WAIT EVENTS Event Wait Time rdbms ipc message 810213291 class slave wait 311501944 parallel recovery slave next change 104565552 gcs remote message 52278608 DIAG idle wait 52247124 |
1 2 3 4 5 6 7 8 | DGMGRL> show instance primary1 'TopWaitEvents' on database logical; TOP SYSTEM WAIT EVENTS Event Wait Time rdbms ipc message 1409929720 class slave wait 135264994 SQL*Net message from client 132785675 wait for unread message on broadcast channel 88204824 gcs remote message 88160985 |
This information is fetched from V$SYSTEM_EVENT View.
10. With the example below, we can see the situations where the broker configuration file and the SPFILE parameter file are inconsistent with each other.
We are querying the Broker Configuration.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | DGMGRL> show configuration Configuration - Broker_Configuraiton Protection Mode: MaxPerformance Databases: primary - Primary database Warning: ORA-16809: multiple warnings detected for the database standby - Physical standby database logical - Logical standby database Fast-Start Failover: DISABLED Configuration Status: WARNING |
We question what is causing the error.
1 2 3 4 5 | DGMGRL> show database primary 'StatusReport'; STATUS REPORT INSTANCE_NAME SEVERITY ERROR_TEXT primary1 WARNING ORA-16714: the value of property LogArchiveTrace is inconsistent with the database setting primary2 WARNING ORA-16714: the value of property LogArchiveTrace is inconsistent with the database setting |
We compare the values of the LogArchiveTrace parameter with the Broker configuration file in the Database.
1 2 3 4 5 | DGMGRL> show database primary 'InconsistentProperties'; INCONSISTENT PROPERTIES INSTANCE_NAME PROPERTY_NAME MEMORY_VALUE SPFILE_VALUE BROKER_VALUE primary1 LogArchiveTrace 6 0 0 primary2 LogArchiveTrace 6 0 0 |
We also check if this is the case by checking the values of the parameters.
1 2 3 4 5 6 7 8 9 | DGMGRL> show instance primary1 'LogArchiveTrace' on database primary; LogArchiveTrace = '0' ********** [Primary-1] SQL> show parameter log_archive_trace NAME TYPE VALUE ------------------------------------ ----------- ------------------------------ log_archive_trace integer 6 |
We reset the parameter value in the Database.
1 2 3 | [Primary-1] SQL> alter system set log_archive_trace=0 scope=both sid='*'; System altered. |
We check if the configuration is healthy.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 | [Primary-1] SQL> show parameter log_archive_trace NAME TYPE VALUE ------------------------------------ ----------- ------------------------------ log_archive_trace integer 0 DGMGRL> show configuration Configuration - Broker_Configuraiton Protection Mode: MaxPerformance Databases: primary - Primary database standby - Physical standby database logical - Logical standby database Fast-Start Failover: DISABLED Configuration Status: SUCCESS DGMGRL> show database primary 'StatusReport'; STATUS REPORT INSTANCE_NAME SEVERITY ERROR_TEXT |