12 Tune and Troubleshoot Oracle Data Guard

When redo transport, redo apply, or role transitions are not meeting your expected requirements, use the following guidelines to help you tune and troubleshoot your deployment.

Overview of Oracle Data Guard Tuning and Troubleshooting

To get the best performance from your Oracle Data Guard configuration, use the following Oracle MAA best practices for monitoring, assessment, and performance tuning.

  • Ensure that Oracle Database and Oracle Data Guard configuration best practices are in place.

    The assumption when assessing and tuning is that all of the Oracle Database and Data Guard configuration best practices are already integrated in the environment. Evaluate the adherence to those best practices before doing any tuning.

  • Assess and tune redo transport services

    Oracle Data Guard automatically tunes redo transport to optimize performance. However, if you observe performance issues, you can monitor and tune redo transport services.

    Asynchronous redo transport with Maximum Performance data protection mode is the default Oracle Data Guard configuration. Tuning asynchronous redo transport consists mainly of ensuring that the primary, standby, and network resources are sufficient for handling the workload, and that you monitor those resources for bottlenecks.

    Synchronous redo transport does sacrifice some performance for zero data loss; however, using sound MAA recommended methods, you can monitor and assess the impact and distribute resources appropriately.

  • Assess and tune redo apply

    In most cases, the default Oracle settings result in satisfactory performance for media recovery when the standby is always up to date. However, as applications and databases increase in size and throughput, media recovery operations can benefit from additional tuning to further optimize recovery time or redo apply throughput on a standby database

  • Assess and tune role transitions

    With proper planning and implementation, Oracle Data Guard and Active Data Guard role transitions can effectively minimize downtime and ensure that the database environment is restored with minimal impact on the business. Performance tests using a physical standby database and Oracle Maximum Availability Architecture (MAA) best practices have shown that switchover and failover can be reduced to seconds.

Redo Transport Troubleshooting and Tuning

Oracle Data Guard redo transport performance is directly dependent on the performance of the primary and standby systems, the network that connects them, and the I/O subsystem.

Understanding the topology of the Data Guard configuration and its relevance to Data Guard performance helps eliminate infrastructure weaknesses that are often incorrectly attributed to the Data Guard architecture.

System and Network Performance Prerequisites

Oracle Data Guard architecture is very streamlined and efficient; however, like any application, there are reasonable system and network prerequisites to achieve satisfactory performance. After all of the configuration best practices have been implemented as described in Oracle Data Guard Configuration Best Practices, consider the following recommendations.

Primary Database System

  • Sufficient CPU utilization for LGWR to post foregrounds efficiently

  • Sufficient I/O bandwidth so local log writes maintain low I/O latency during peak rates

  • Network interfaces that can handle peak redo rate volumes combined with any other network activity across the same interface

  • Primary AWR, ASH, and OSwatcher data gathered for tuning and troubleshooting

Standby Database System

  • Sufficient CPU utilization for RFS (the Data Guard process that receives redo at the standby database) to efficiently write to standby redo logs

  • Sufficient I/O bandwidth to enable local log writes to maintain low I/O latency during peak rates

  • A network interface that can receive the peak redo rate volumes combined with any other network activity across the same interface

  • Standby AWR, ASH, and OSwatcher data should be gathered

Note:

The top problem encountered with the standby database is poor standby log write latency due to insufficient I/O bandwidth. This problem can be mitigated by using Data Guard Fast Sync.

Network

  • For synchronous redo transport, the round trip network latency (RTT) between the primary database and target standby database or target far sync standby server should typical less than 2 ms to minimize primary database performance impact.

    Oracle is often asked what the maximum latency is that can be supported or whether there is an equation that can be used to project performance impact. Unfortunately every application has a different tolerance to network latency. Differences in application concurrency, number of sessions, the transaction size in bytes, how often sessions commit, and log switch frequency can cause differences application performance impact.

  • Sufficient network bandwidth to support peak redo rates (steady state and when resolving gaps) combined with any other network activity that shares the same network.

    Note that your point to point network bandwidth is throttled by the network segment, switch, router, or interface with the lowest network bandwidth. For example, if you have 10gigE for 90% of your network route and your existing switch or network interface only supports 1 GigE, then your maximum network bandwidth is 1 GigE.

  • Netstat and/or any network monitoring statistics should be gathered

Note:

The top network problems encountered are inconsistent network response and insufficient network bandwidth.

Monitor System Resources

Monitor system resources on primary and standby hosts and the network using the utilities and statistics recommended here.

Monitoring CPU

  • When all of a system's CPU cores are occupied running work for processes, other processes must wait until a CPU core or thread becomes available, or the scheduler switches a CPU to run their code. A bottleneck can be created if too many processes are queued too often.

    You can use the uptime, mpstat, sar, dstat, and top utilities to monitor CPU usage.

  • The commands mpstat -P ALL and sar -u -P ALL display CPU usage statistics for each CPU core and an average across all CPU cores.

  • The %idle value shows the percentage of time that a CPU is not running system code or process code. If the value of %idle is near 0% most of the time on all CPU cores, the system is CPU-bound for the workload that it is running. The percentage of time spent running system code (%systemor %sys) should not usually exceed 30%, especially if %idle is close to 0%.

  • The system load average represents the number of processes that are running on CPU cores, waiting to run, or waiting for disk I/O activity to complete, averaged over a period of time. On a busy system, the load average reported by uptime or sar -q should not exceed two times the number of CPU cores. The system is overloaded if the load average exceeds four times the number of CPU cores for long periods.

  • In addition to load averages (ldavg-*), the sar -q command reports the number of processes currently waiting to run (the run-queue size, runq-sz) and the total number of processes (plist_sz). The value of runq-sz also provides an indication of CPU saturation.

  • Determine the system's average load under normal loads when users and applications do not experience problems with system responsiveness, and then look for deviations from this benchmark over time. A dramatic rise in the load average can indicate a serious performance problem.

Monitoring Memory Usage

  • When a system runs out of real or physical memory and starts using swap space, its performance deteriorates dramatically. If you run out of swap space, your programs or the entire operating system are likely to crash.

    If free or top indicate that little swap space remains available, this is also an indication you are running low on memory.

  • The sar -r command reports memory utilization statistics, including %memused, which is the percentage of physical memory in use.

  • The sar -B command reports memory paging statistics, including pgscank/s, which is the number of memory pages scanned by the kswapd daemon per second, and pgscand/s, which is the number of memory pages scanned directly per second.

  • The sar -W command reports swapping statistics, including pswpin/s and pswpout/s, which are the numbers of pages per second swapped in and out per second.

  • The system has a memory shortage if %memused is near 100% and the scan rate is continuously over 200 pages per second.

  • The output from the dmesg command might include notification of any problems with physical memory that were detected at boot time.

Monitoring I/O

  • Use iostat, sar, and iotop utilities to monitor I/O.

  • The iostat command monitors the loading of block I/O devices by observing the time that the devices are active relative to the average data transfer rates. You can use this information to adjust the system configuration to balance the I/O loading across disks and host adapters.

    iostat -x reports extended statistics about block I/O activity at one second intervals, including %util, which is the percentage of CPU time spent handling I/O requests to a device, and avgqu-sz, which is the average queue length of I/O requests that were issued to that device.

    If %util approaches 100% or avgqu-sz is greater than 1, device saturation is occurring and the storage I/O bandwidth needs to be augmented by adding disks or storage.

  • You can also use the sar -d command to report on block I/O activity, including values for %util and avgqu-sz.

  • The iotop utility can help you identify which processes are responsible for excessive disk I/O. The iotop user interface is similar to top.

    In its upper section, iotop displays the total disk input and output usage in bytes per second. In its lower section, iotop displays I/O information for each process, including disk input-output usage in bytes per second, the percentage of time spent swapping in pages from disk or waiting on I/O, and the command name.

    Use the left and right arrow keys to change the sort field, and press A to toggle the I/O units between bytes per second and total number of bytes, or O to toggle between displaying all processes or only those processes that are performing I/O.

Monitoring the Network

  • Use ip, ss, and sar to monitor the network.

  • The ip -s link command displays network statistics and errors for all network devices, including the numbers of bytes transmitted (TX) and received (RX). The dropped and overrun fields provide an indicator of network interface saturation.

  • The ss -s command displays summary statistics for each protocol.

  • Use the sar –n DEV command to monitor the current rate transmitted using an interface.

Assess Database Wait Events

Once you verify that the system or network resources are not bottlenecked, you can assess database wait events for performance flags related to your Oracle Data Guard configuration.

On the primary database, assess database wait events with AWR reports, and on the standby database you can use standby AWR for Oracle Database 18c and later, or statspack reports for older Oracle Database releases.

Table 12-1 Wait Events Relevant to Oracle Data Guard

Event Name Description
ARCH Remote Write The time (in centi-seconds) that ARCn background processes spend blocked waiting for network write operations to complete
ASYNC Remote Write The time (in centi-seconds) for asynchronous streaming RFSWRITE operations

This includes stall reaps and streaming network submission time. This time is accumulated by TTnn (Redo Transport Slave) background processes.

Redo Transport Attach The time spent (in centi-seconds) doing Connect, Logon, and RFSATTACH for any network process
Redo Transport Close The time spent (in centi-seconds) by ARCn, NSSn, and TTnn processes doing RFSCLOSE and RFSRGSTR operations
Redo Transport Detach The time spent (in centi-seconds) doing RFSDETACH and Disconnect for any network process
Redo Transport Open The time spent (in centi-seconds) by ARCn, NSSn, and TTnn background processes doing RFSCREAT and RFSANNCE operations
Redo Transport Ping The time spent (in centi-seconds) by ARCn background processes doing RFSPING operations
Redo Transport Slave Shutdown The time spent (in centi-seconds) by LGWR doing NSSn and TTnn process shutdown and termination
Redo Transport Slave Startup The time spent (in centi-seconds) by LGWR doing NSSn and TTnn process startup and initialization
Redo Writer Remote Sync Complete The time spent (in centi-seconds) by LGWR reaping completed network writes to remote destinations
Redo Writer Remote Sync Notify The time spent (in centi-seconds) by LGWR issuing network writes to remote destinations
Remote SYNC Ping The time spent (in centi-seconds) by the LGWR and NSSn background processes doing synchronous PING operations
SYNC Remote Write The time spent by LGWR doing SYNC RFSWRITE operations

Wait events specific to Oracle Data Guard with Oracle Database 11.2 are described in the table below.

Table 12-2 Wait Events Relevant to Oracle Data Guard 11.2

Event Name Description
ARCH wait on ATTACH Monitors the amount of time spent by all archiver processes to spawn a new RFS connection
ARCH wait on SENDREQ Monitors the amount of time spent by all archiver processes to write archive logs to the local disk as well as write them remotely
ARCH wait on DETACH Monitors the amount of time spent by all archiver processes to delete an RFS connection
LNS wait on ATTACH Monitors the amount of time spent by all LNS processes to spawn a new RFS connection
LNS wait on SENDREQ Monitors the amount of time spent by all LNS processes to write the received redo to disk as well as open and close the remote archived redo logs
LNS wait on DETACH Monitors the amount of time spent by all LNS processes to delete an RFS connection
LGWR wait on LNS Monitors the amount of time spent by the log writer (LGWR) process waiting to receive messages from LNS processes
LNS wait on LGWR Monitors the amount of time spent by LNS processes waiting to receive messages from the log writer (LGWR) process
LGWR-LNS wait on channel Monitors the amount of time spent by the log writer (LGWR) process or the LNS processes waiting to receive messages

See Installing and Using Standby Statspack (Doc ID 454848.1) for information about Standby Statspack.

Assess Synchronous Redo Transport

The following topics describe how to assess redo transport, specifically for synchronous redo transport.

Understanding How Synchronous Transport Ensures Data Integrity

The following algorithms ensure data consistency in an Oracle Data Guard synchronous configuration.

  • LGWR redo write on the primary database and the Data Guard NSS network redo write are identical.

  • The Data Guard Managed Recovery Process (MRP) at the standby database cannot apply redo unless the redo has been written to the primary's online redo log, with the only exception being during a Data Guard failover operation (primary is gone).

    In addition to shipping redo synchronously, NSS and LGWR exchange information regarding the safe redo block boundary that standby recovery can apply up to from its standby redo logs. This prevents the standby from applying redo it may have received, but which the primary has not yet acknowledged as committed to its own online redo logs.

The possible failure scenarios include:

  • If primary database's LGWR cannot write to online redo log, then LGWR and instance will crash. Instance or crash recovery will recover to the last committed transaction in the online redo log and roll back any uncommitted transactions. The current log will be completed and archived.

  • On the standby, the partial standby redo log completes with the correct value for the size to match the corresponding online redo log. If any redo blocks are missing from the SRL, those are shipped over (without reshipping the entire redo log).

  • If the primary database crashes resulting in an automatic or manual zero data loss failover, then part of the Data Guard failover operation will do "terminal recovery" and read and recover the current standby redo log.

    Once recovery finishes applying all of the redo in the standby redo logs, the new primary database comes up and archives the newly completed log group. All new and existing standby databases discard any redo in the online redo logs, flashback to consistent SCN, and only apply the archives coming from the new primary database. Once again the Data Guard environment is in sync with the (new) primary database.

Assessing Performance in a Synchronous Redo Transport Environment

When assessing performance in a synchronous redo transport environment it is important that you know how the different wait events relate to each other. The impact of enabling synchronous redo transport varies between applications.

To understand why, consider the following description of work the log writer process (LGWR) performs when a commit is issued.

  1. Foreground process posts LGWR for commit ("log file sync" starts). If there are concurrent commit requests queued, LGWR will batch all outstanding commit requests together resulting in a continuous strand of redo.

  2. LGWR waits for CPU.

  3. LGWR starts redo write ("redo write time" starts).

  4. For Oracle RAC, LGWR broadcasts the current write to other instances.

  5. After preprocessing, if there is a SYNC standby, LGWR starts the remote write (“SYNC remote write” starts).

  6. LGWR issues local write ("log file parallel write").

  7. If there is a SYNC standby, LGWR waits for the remote write to complete.

  8. After checking the I/O status, LGWR ends "redo write time / SYNC remote write".

  9. For Oracle RAC, LGWR waits for the broadcast ack.

  10. LGWR updates the on-disk SCN.

  11. LGWR posts the foregrounds.

  12. Foregrounds wait for CPU.

  13. Foregrounds ends "log file sync".

Use the following approaches to assess performance.

  • For batch loads the most important factor is to monitor the elapsed time, because most of these processes must be completed in a fixed period of time. The database workloads for these operations are very different than the normal OLTP workloads. For example, the size of the writes can be significantly larger, so using log file sync averages does not give you an accurate view or comparison.

  • For OLTP workloads, monitor the volume of transactions per second (from AWR) and the redo rate (redo size per second) from the AWR report. This information gives you a clear picture of the application throughput and how it is impacted by enabling synchronous redo transport.

Why the log file sync Wait Event is Misleading

Typically, the log file sync wait event on the primary database is the first place administrators look when they want to assess the impact of enabling synchronous redo transport.

If the average log file sync wait before enabling synchronous redo transport (SYNC) was 3ms, and after was 6ms, then the assumption is that SYNC impacted performance by one hundred percent. Oracle does not recommend using log file sync wait times to measure the impact of SYNC because the averages can be very deceiving, and the actual impact of SYNC on response time and throughput may be much lower than the event indicates.

When a user session commits, LGWR will go through the process of getting on the CPU, submitting the I/O, waiting for the I/O to complete, and then getting back on the CPU to post foreground processes that their commit has completed. This whole time period is covered by the log file sync wait event. While LGWR is performing its work there are, in most cases, other sessions committing that must wait for LGWR to finish before processing their commits. The size and number of sessions waiting are determined by how many sessions an application has and how frequently those sessions commit. This batching up of commits is generally referred to as application concurrency.

For example, assume that it normally takes 0.5ms to perform log writes (log file parallel write), 1ms to service commits (log file sync), and on average you are servicing 100 sessions for each commit. If there was an anomaly in the storage tier, and the log write I/O for one commit took 20ms to complete, then you could have up to 2,000 sessions waiting on log file sync, while there would only be 1 long wait attributed to log file parallel write. Having a large number of sessions waiting on one long outlier can greatly skew the log file sync averages.

The output from V$EVENT_HISTOGRAM for the log file sync wait event for a particular period in time is shown in the following table.

Table 12-3 V$EVENT_HISTOGRAM Output for the Log File Sync Wait Event

Milliseconds Number of Waits Percent of Total Waits
1 17610 21.83%
2 43670 54.14%
4 8394 10.41%
8 4072 5.05%
16 4344 5.39%
32 2109 2.61%
64 460 0.57%
128 6 0.01%

The output shows that 92% of the log file sync wait times are less than 8ms, with the vast majority less than 4ms (86%). Waits over 8ms are outliers and only make up 8% of wait times overall, but because of the number of sessions waiting on those outliers (because of batching of commits) the averages get skewed. The skewed averages are misleading when log file sync average waits times are used as a metric for assessing the impact of SYNC.

Understanding What Causes Outliers

Any disruption to the I/O on the primary or standby, or spikes in network latency, can cause high log file sync outliers with synchronous redo transport. You can see this effect when the standby system's I/O subsystem is inferior to that of the primary system.

Often administrators host multiple databases such as development and test on standby systems, which can impair I/O response. It is important to monitor I/O using iostat as described in Monitoring I/O to determine if the disks reach maximum IOPS, because this affects the performance of SYNC writes.

Frequent log switches are significant cause of outliers. Consider what occurs on the standby when a log switch on the primary occurs, as follows.

  1. RFS on the standby must finish updates to the standby redo log header.

  2. RFS then switches into a new standby redo log with additional header updates.

  3. Switching logs forces a full checkpoint on the standby. This causes all dirty buffers in the buffer cache to be written to disk, causing a spike in write I/O. In a non-symmetric configuration where the standby storage subsystem does not have the same performance as the primary database, this results in higher I/O latencies.

  4. The previous standby redo log must be archived, increasing both read and write I/O.

Effects of Synchronous Redo Transport Remote Writes

When you enable synchronous redo transport (SYNC), you introduce a remote write (RFS write to a standby redo log) in addition to the normal local write for commit processing.

This remote write, depending on network latency and remote I/O bandwidth, can make commit processing time increase. Because commit processing takes longer, you observe more sessions waiting on LGWR to finish its work and begin work on the commit request, that is, application concurrency has increased. You can observe increased application concurrency by analyzing database statistics and wait events. Consider the example in the following table.

Table 12-4 Affect of Sync Transport Increasing Application Concurrency

SYNC Redo Rate Network Latency TPS from AWR log file sync average (ms) log file parallel write average (ms) RFS random I/O SYNC remote write average (ms) Redo write size (KB) Redo writes
Defer 25MB 0 5,514.94 0.74 0.47 NA NA 10.58 2,246,356
Yes 25MB 0 5,280.20 2.6 .51 .65 .95 20.50 989,791
Impact 0 - -4% +251% +8.5% NA NA +93.8% -55.9%

In the above example, enabling SYNC reduced the number of redo writes, but increased the size of each redo write. Because the size of the redo write increased, you can expect the time spent doing the I/O (both local and remote) to increase. The log file sync wait time is higher because there is more work per wait.

However, at the application level, the impact on the transaction rate or the transaction response time might change very little as more sessions are serviced per commit. This is why it is important to measure the impact of SYNC at the application level, and not depend entirely on database wait events. It is also a perfect example of why the log file sync wait event is a misleading indicator of the actual impact SYNC has on the application.

Example of Synchronous Redo Transport Performance Troubleshooting

To look at synchronous redo transport performance, calculate the time spent for local redo writes latency, average redo write size per write, and overall redo write latency, as shown here.

Use the following wait events to do the calculations.

  • local redo write latency = 'log file parallel write'

  • remote write latency = ‘SYNC remote write’

  • average redo write size per write = ‘redo size’ / ‘redo writes’

  • average commit latency seen by foregrounds = 'log file sync'

Statistics from an AWR report on an Oracle database are provided in the following table. Synchronous redo transport (SYNC) was enabled to a local standby with a 1ms network latency to compare the performance impact to a baseline with SYNC disabled.

Table 12-5 Assessing Synchronous Redo Transport Performance with Oracle Database

Metric Baseline (No SYNC) SYNC Impact
redo rate (MB/s) 25 25 no change
log file sync 0.68 4.60 +576%
log file parallel write average (ms) 0.57 0.62 +8.8%
TPS 7,814.92 6224.03 -20.3%
RFS random I/O NA 2.89 NA
SYNC remote write average (ms) NA 3.45 NA
redo writes 2,312,366 897,751 -61,2%
redo write size (KB) 10.58 20.50 +93.8%

In the above example observe that log file sync waits averages increased dramatically after enabling SYNC. While the local writes remained fairly constant, the biggest factor in increasing log file sync was the addition of the SYNC remote write. Of the SYNC remote write the network latency is zero, so focusing on the remote write into the standby redo log shows an average time of 2.89ms. This is an immediate red flag given that the primary and standby were using the same hardware, and the SYNC remote write average time should be similar to the primary's log file parallel write average time.

In the above example the standby redo logs have multiple members, and they are placed in a slower performing disk group. After reducing the standby redo logs to a single member and placing them in a fast disk group the you can see results such as those shown in the following table.

Table 12-6 SYNC Performance After Reducing Standby Redo Logs to a Single Member and Placing on a Fast Disk Group

Metric Baseline (No SYNC) SYNC Impact
redo rate (MB/s) 25 25 no change
log file sync 0.67 1.60 +139%
log file parallel write 0.51 0.63 +23.5%
TPS 7714.36 7458.08 -3.3%
RFS random I/O NA .89 NA
SYNC remote write average (ms) NA 1.45 NA
redo writes 2,364,388 996,532 -57.9%
redo write size (KB) 10.61 20.32 +91.5%

Redo Apply Troubleshooting and Tuning

Redo apply performance depends mainly on the type of workload that is being recovered and the system resources allocated to recovery.

Recovering an OLTP workload can be very I/O intensive because there are a large number of small random reads and writes. The higher the number of IOPS (I/O per second) that a storage subsystem can handle, the faster the recovery rate can be.

In contrast, recovering batch workloads is more efficient because it consists of large sequential reads and writes, resulting in much faster recovery rates than OLTP workloads running on equivalent system resources. In addition, batch direct load operation recovery optimizations result in greater efficiency and even higher recovery rates.

The difference between OLTP and batch recovery performance profiles explains why applications with variation in their mixtures of OLTP and batch workloads can have different recovery rates at a standby database, even if the primary database redo generation rates are similar.

As changes occur on the primary database, redo is generated and sent to the standby database. The frequency of shipping redo to the standby is determined by whether the remote destination is using synchronous or asynchronous redo transport.

If redo apply was started using real-time apply, redo generated by the primary database is applied to the standby database as soon as it is received (that is, there is no wait for the database to switch logs). An Oracle Active Data Guard standby database that is open read-only while recovery is active enables users to query current data.

If your standby database is lagging relative to the primary database, you should focus on the following main areas:

  • Determining the transport lag

  • Determining the apply lag

  • Determining the query SCN or time

  • Determining how far behind the standby data is compared to the primary database

Monitor Apply Lag

For standby databases on symmetric hardware and configuration, the apply lag should less than 10 seconds.

If you observe high redo transport lag, tune the network transport first and address any insufficient network bandwidth or any overhead resulting from redo encryption or compression.

To monitor apply lag on the standby database query the V$DATAGUARD_STATS view.

SQL> SELECT name,value,time_computed,datum_time FROM v$dataguard_stats WHERE name=’%lag’;

The DATUM_TIME column is the local time on the standby database when the datum used to compute the metric was received. The lag metrics are computed based on data that is periodically received from the primary database. An unchanging value in this column across multiple queries indicates that the standby database is not receiving data from the primary database. The potential data loss in this scenario would be from the last datum time from V$DATAGUARD_STATS to the current time on the standby.

To obtain a histogram that shows the history of apply lag values since the standby instance was last started, query the V$STANDBY_EVENT_HISTOGRAM view.

SQL> SELECT * FROM v$standby_event_histogram WHERE name like '%lag' and count >0;

To evaluate the apply lag over a period of time, take a snapshot of V$STANDBY_EVENT_HISTOGRAM at the beginning of the time period and compare that snapshot with one taken at the end of the time period.

SQL> col NAME format a10
SQL> SELECT NAME,TIME,UNIT,COUNT,LAST_TIME_UPDATED FROM V$STANDBY_EVENT_HISTOGRAM WHERE name like '%lag' and count >0 ORDER BY LAST_TIME_UPDATED;

NAME             TIME UNIT                  COUNT LAST_TIME_UPDATED
---------- ---------- ---------------- ---------- --------------------
apply lag          41 seconds                   3 04/05/2018 16:30:59
apply lag          44 seconds                   1 04/05/2018 16:31:02
apply lag          45 seconds                   2 04/05/2018 16:31:03
apply lag          46 seconds                   2 04/05/2018 16:31:04

Evaluate Redo Apply Rate if the Apply Lag Is High

In an Oracle Data Guard physical standby environment, it is important to determine if the standby database can recover redo as fast as, or faster than, the primary database can generate redo.

If the apply lag is above expectations, then evaluate redo apply performance by querying the V$RECOVERY_PROGRESS view. This view contains the columns described in the following table.

Table 12-7 V$RECOVERY_PROGRESS View Columns

Column Description
Average Apply Rate Redo Applied / Elapsed Time includes time spent actively applying redo and time spent waiting for redo to arrive.
Active Apply Rate Redo Applied / Active Time is a moving average over the last 3 minutes. The rate does not include time spent waiting for redo to arrive.
Maximum Apply Rate Redo Applied / Active Time is peak measured throughput or maximum rate achieved over a moving average over last 3 minutes. The rate does not include time spent waiting for redo to arrive.
Redo Applied Represents the total amount of data in bytes that has been applied.
Last Applied Redo SCN and Timestamp of last redo applied. This is the time as stored in the redo stream, so it can be used to compare where the standby database is relative to the primary.
Apply Time per Log Average time spent actively applying redo in a log file.
Checkpoint Time per Log Average time spent for a log boundary checkpoint.
Active Time Represents the total duration applying the redo , but not waiting for redo
Elapsed Time Represents the total duration applying the redo , including waiting for redo
Standby Apply Lag Represents Redo Apply has not applied for N seconds, possible standby is behind the primary.
Log Files Represents Number of Log files applied so far.

The most useful statistic is the Active Apply rate because the Average Apply Rate includes idle time spent waiting for redo to arrive making it less indicative of apply performance.

The simplest way to determine application throughput in terms of redo volume is to collect Automatic Workload Repository (AWR) reports on the primary database during normal and peak workloads, and determine the number of bytes per second of redo data the production database is producing. You can then compare the speed at which redo is being generated with the Active Apply Rate columns in the V$RECOVERY_PROGRESS view to determine if the standby database is able to maintain the pace.

Tune Redo Apply by Evaluating Database Wait Events

Once you have verified that you are not bottlenecked on any system or network resources you are ready to assess database wait events. On the primary database this is done using AWR reports while on the standby database you will use standby AWR.

Before assessing database wait events, it is important that you understand the process flow involved in recovery. In general there are three distinct phases to standby recovery; the log read phase, the apply phase, and the checkpoint phase.

  1. Redo is received on the standby by the RFS (Remote File Server) process.

    The RFS process writes newly received redo for each thread into the current standby redo log for that thread. The RFS write operation is tracked by the rfs random I/O wait event.

  2. Once redo has been written the recovery coordinator process (pr00) will read the redo from the standby redo logs for each thread.

    This read I/O is tracked by the log file sequential read operation. The recovery coordinator then merges redo from all threads together and place the redo into memory buffers for the recovery slaves. The wait events for writing and reading into recovery memory buffers is tracked by the parallel recovery read buffer free and parallel recovery change buffer free wait events.

  3. The recovery processes retrieve redo or change vectors from the memory buffers and begin the process applying the changes to data blocks.

    First the recovery slaves determine which data blocks need to be recovered and reads those into the buffer cache if it’s not already present. This read I/O by the recovery slaves is tracked by the recovery read wait event.

  4. When a log is switched on the primary for any thread the standby will coordinate a switch of the standby redo log for that thread at the same time.

    A log switch on a standby will force a full checkpoint which will result in flushing all dirty buffers from the buffer cache out to the data files on the standby. A checkpoint on the standby is currently more expensive than on a primary. Multiple DB writer processes (DBWR) will write the data file blocks down to the data files with its write time tracked by the db file parallel write wait event. The total time for the checkpoint to complete is covered by the checkpoint complete wait event.

During the apply phase it is normal to observe that the recovery coordinator process (pr00) has high utilization on a single CPU, while during the checkpoint phase you normally see an increase in the write I/O to the data files.

The following table provides a description as well as tuning advice for wait events involved in the recovery process.

Table 12-8 Recovery Process Wait Events

Column Description Tuning Recommendations
Logfile sequential read The parallel recovery coordinator is waiting on I/O from the online redo log or the archived redo log. Tune or increase the I/O bandwidth for the ASM disk group where the archive logs or online redo logs reside.
Parallel recovery read buffer free This event indicates that all read buffers are being used by slaves, and usually indicates that the recovery slaves lag behind the coordinator. Increase _log_read_buffers to max 256
Parallel recovery change buffer free Redo Applied / Active Time is peak measured throughput or maximum rate achieved over a moving average over last 3 minutes. The rate does not include time spent waiting for redo to arrive. Tune or increase the I/O bandwidth for the ASM disk group where data files reside.
Data file init write The parallel recovery coordinator is waiting for a buffer to be released by a recovery slave. Again, this is a sign the recovery slaves are behind the coordinator. Tune or increase the I/O bandwidth for the ASM disk group where data files reside.
Parallel recovery control message reply The parallel recovery coordinator is waiting for a file resize to finish, as would occur with file auto extend. This is a non-tunable event.
Parallel recovery slave next change The parallel recovery slave is waiting for a change to be shipped from the coordinator. This is in essence an idle event for the recovery slave. To determine the amount of CPU a recovery slave is using, divide the time spent in this event by the number of slaves started and subtract that value from the total elapsed time. This may be close, because there are some waits involved. N/A. This is an idle event.
DB File Sequential Read A parallel recovery slave (or serial recovery process) is waiting for a batch of synchronous data block reads to complete. Tune or increase the I/O bandwidth for the ASM disk group where data files reside.
Checkpoint completed Recovery is waiting for checkpointing to complete, and Redo Apply is not applying any changes currently.

Tune or increase the I/O bandwidth for the ASM disk group where data files reside.

Also, increase the number of db_writer_processes until the checkpoint completed wait event is lower than the db file parallel write wait event. Consider also increasing the online log file size on the primary and standby to decrease the number of full checkpoints at log switch boundaries.

Recovery read A parallel recovery slave is waiting for a batched data block I/O. Tune or increase the I/O bandwidth for the ASM disk group where data files reside.
Parallel recovery change buffer free (MIRA) The parallel recovery coordinator is waiting for a change mapping buffer to be released by one of the recovery slaves. Increase _change_vector_buffers to 2 or 4
Recovery apply pending and/or recovery receive buffer free (MIRA)

Recovery apply pending: the time the logmerger process waited (in centiseconds) for apply slaves to apply all pending changes up to a certain SCN.

Recovery receive buffer free: the time (in centiseconds) spent by the receiver process on instance waiting for apply slaves to apply changes from received buffers so that they can be freed for the next change.

Increase _mira_num_local_buffers and _mira_num_receive_buffers

Note that these parameters use space from the shared pool equal to the sum of their values (in MB) multiplied by the number of apply instances.

See How to Generate AWRs in Active Data Guard Standby Databases (Doc ID 2409808.1) for more information about generating AWRs on the standby database.

Enable Multi-Instance Redo Apply if Required

Under extremely heavy workloads, a single instance redo apply is not sufficient.

If there are insufficient CPU resources on the standby redo apply database server, or if the Recovery Coordinator process (pr00) is CPU bound (by checking OS utilities), then multi-instance redo apply will allow you to scale across Oracle RAC instances.

Multi-instance redo apply greatly improves the scalability of redo apply for Oracle RAC databases. Rather than merging all threads of redo into a single apply process, multiple apply instances divide the threads of redo between them. For example, when two apply nodes are used to apply four threads of redo from the primary, each apply instance will apply two threads.

When the workload is well balanced between all threads of the primary, the scalability of multi-instance redo apply is predictable (2x, 4x and so on). Unbalanced workloads, where one instance does more work than another, get mixed scaling results compared to single instance redo apply.

Redo Apply Performance Tuning Example

The following is an example which applies the redo apply tuning discussed in this document.

The first step to assess performance and determine steps to improve performance is to take standby statspack snaps that capture statistics on the standby database.

After generating a standby statspack report based off of those snaps the next step is to examine the load profile section.

Load Profile                   Total         Per Second
~~~~~~~~~~~~       ------------------  -----------------
      DB time(s):                9.2                0.0
       DB CPU(s):                0.7                0.0
 Redo MB applied:           14,497.8                9.7
   Logical reads:              471.0                0.3
  Physical reads:        6,869,213.0            4,570.3
 Physical writes:        9,722,929.0            6,469.0
      User calls:              510.0                0.3
          Parses:            1,192.0                0.8
     Hard parses:                3.0                0.0
W/A MB processed:               35.9                0.0
          Logons:               45.0                0.0
        Executes:            1,410.0                0.9
       Rollbacks:                0.0                0.0

From the load profile shown above, you can see the amount of redo recovered during the report period and the amount of redo recovered on a per second basis. In addition you can see the number of physical reads and writes being performed.

Compare the number of reads and writes to the baseline reports that showed acceptable performance. Increased reads could be coming from read-only queries that might not have been seen previously, while increased writes could indicate a change in the type of workload being recovered.

The top five wait events section is the most important section to review. This section details where the majority of the time is spent waiting, as shown below.

Top 5 Timed Events                                                    Avg %Total
~~~~~~~~~~~~~~~~~~                                                   wait   Call
Event                                            Waits    Time (s)   (ms)   Time
----------------------------------------- ------------ ----------- ------ ------
checkpoint completed                             4,261       8,210   1927   42.1
db file parallel write                          20,982       2,658    127   13.6
lreg timer                                         501       1,502   2998    7.7
free buffer waits                              119,108       1,278     11    6.5
parallel recovery read buffer free               8,046       1,101    137    5.6

In the output above the 42% call time is spent waiting for checkpoint completed, with the second wait event db file parallel write being associated with the checkpoint completed wait as it relates to DBWR performing writes from the buffer cache. The free buffer waits wait event indicates that the buffer cache is full and recovery slaves reading buffers into the buffer cache are stalled.

This wait event profile indicates that you should increase the number of DBWR processes to help increase the rate that buffers can be flushed from the buffer cache.

Before making any changes, consult the statspack report to make note of the recovery active apply rate and the time being spent in the apply phase versus the checkpoint phase. After you adjust the number of DBWR processes, compare the subsequent standby statspack reports to these numbers to see improvement.

Recovery Start Time Item                       Sofar Units   Redo Timestamp
------------------- ----------------- -------------- ------- ------------------
21-Oct-15 08:03:56  Log Files                      6 Files
21-Oct-15 08:03:56  Active Apply Rate         10,809 KB/sec
21-Oct-15 08:03:56  Average Apply Rat         10,708 KB/sec
21-Oct-15 08:03:56  Maximum Apply Rat         80,592 KB/sec
21-Oct-15 08:03:56  Redo Applied              15,111 Megabyt
21-Oct-15 08:03:56  Last Applied Redo              0 SCN+Tim 20-Oct-15 12:48:25
21-Oct-15 08:03:56  Active Time                1,408 Seconds
21-Oct-15 08:03:56  Apply Time per Lo            121 Seconds
21-Oct-15 08:03:56  Checkpoint Time p             14 Seconds
21-Oct-15 08:03:56  Elapsed Time               1,445 Seconds
21-Oct-15 08:03:56  Standby Apply Lag         70,785 Seconds

It’s important that you make only one change at a time, and obtain new standby statspack reports to assess any improvement based on that change. For each change follow the same methodology of assessing the physical reads and writes, the top wait events, and the time spent in the different recovery phases.

Role Transition Assessment and Tuning

With thorough planning, configuration, and tuning, Oracle Data Guard role transitions can effectively minimize downtime and ensure that the database environment is restored with minimal impact on the business.

Using a physical standby database, Oracle MAA testing has determined that switchover and failover times with Oracle Data Guard have been reduced to seconds. The following topics describe best practices for both switchover and failover. While following best practices, switchover times of approximately 34 seconds for Oracle RAC and 19 seconds for a single instance database have been observed.

Validate Database Switchover and Failover Readiness

The Oracle Data Guard broker VALIDATE DATABASE command gathers information related to switchover and failover readiness.

The validation verifies that the standby and primary database are reachable and the apply lag is less than ApplyLagThreshold for the target database. If these data points are favorable, the command output displays Ready for Failover: Yes as shown below. In addition, if redo transport is running, the command output displays Ready for Switchover: Yes.

DGMGRL> validate database [verbose] database_name

Database Role: Physical standby database
 Primary Database: standby db_unique_name

Ready for Switchover: Yes
 Ready for Failover: Yes (Primary Running)

VALIDATE DATABASE checks additional information that can impact switchover time and database performance, such as whether the online redo logs have been cleared, number of temporary tablespaces, parameter mismatches between primary and standby, and the status of flashback databases.

In most failover cases the primary database has crashed or become unavailable. The Ready for Failover output indicates if the primary database is running when VALIDATE DATABASE was issued. This state does not prevent a failover, but it is recommended that you stop the primary database before issuing a failover to avoid a split-brain scenario where the configuration has two primary databases. The broker only guarantees split-brain avoidance on failover when Fast-Start Failover is used.

You can also run VALIDATE DATABASE periodically as a configuration monitoring tool.

Use the Broker to Initiate Switchover and Failover

Use the Oracle Data Guard broker SWITCHOVER command to initiate switchover, and the FAILOVER command to initiate failover.

As part of a switchover or failover operation the broker does the following.

  • Configures redo transport from the new primary database
  • Starts redo apply on the new standby database
  • Ensures that other standby databases in the broker configuration are viable and receiving redo from the new primary
  • Integrates Oracle Clusterware and Global Data Services to ensure the correct services are started after role change

To configure broker to initiate switchover, run

DGMGRL> SWITCHOVER TO database_name;

To configure broker to initiate failover, run

DGMGRL> FAILOVER TO database_name [IMMEDIATE];

By default FAILOVER applies all redo that was received before failing over. The IMMEDIATE clause skips the pending redo and fails over immediately.

The SWITCHOVER and FAILOVER commands are idempotent and can be re-issued in the unlikely event of a failed transition.

Optimize Failover Processing

Implement the following best practices to optimize failover processing.

  • Enable Flashback Database to reinstate the failed primary databases after a failover operation has completed. Flashback Database facilitates fast point-in-time recovery from failures caused by user error or logical corruption.

  • Enable real-time apply, which allows apply services to apply the redo on the standby databases as soon as it is received.

  • Consider configuring multiple standby databases to maintain data protection following a failover.

  • Use Oracle Managed Files to pre-clear and pre-create online redo logs on the standby database.

    As part of a failover, the standby database must clear its online redo logs before opening as the primary database. The time needed to complete this I/O can add significantly to the overall failover time. With Oracle Managed Files, the standby pre-creates the online redo logs the first time the managed redo process (MRP) is started.

    • Alternatively set the LOG_FILE_NAME_CONVERT parameter. You can also pre-create empty online redo logs by issuing the SQL*Plus ALTER DATABASE CLEAR LOGFILE statement on the standby database.

  • Use fast-start failover. If possible, to optimize switchover processing, ensure that the databases are synchronized before the switchover operation. Real-time apply ensures that redo is applied as received and ensures the fastest switchover.

Enable Fast-Start Failover and Leverage Best Practices

Fast-start failover automatically, quickly, and reliably fails over to a designated standby database if the primary database fails, without requiring manual intervention to execute the failover. You can use fast-start failover only in an Oracle Data Guard configuration that is managed by Oracle Data Guard broker.

The Data Guard configuration can be running in either the maximum availability or maximum performance mode with fast-start failover. When fast-start failover is enabled, the broker ensures fast-start failover is possible only when the configured data loss guarantee can be upheld. Maximum availability mode provides an automatic failover environment guaranteed to lose no data. Maximum performance mode provides an automatic failover environment guaranteed to lose no more than the amount of data (in seconds) specified by the FastStartFailoverLagLimit configuration property.

Adopt fast-start failover best practices discussed in Configure Fast Start Failover.

Assess Time Management Interface Event Alerts to Troubleshoot Role Transition Timings

The Time Management Interface (TMI) event is a low overhead event which adds a line to the alert log whenever certain calls are executed in Oracle.

These entries in the alert log, or tags, delineate the beginning and end of a call. The tables in the topics below depict the delineation of key switchover and failover operations. This method is the most accurate for determining where time is being spent.

Set the database level event 16453 trace name context forever, level 15 on all databases. There are two methods of enabling this trace, either using the EVENT database parameter or setting the EVENTS at the system level. The difference is that the EVENT parameter is not dynamic but is persistent across restarts. set EVENTS is dynamic but NOT persistent across database restarts. See the following examples.

ALTER SYSTEM SET EVENT=‘16453 trace name contextforever, level 15’ scope=spfile sid=’*’
ALTER SYSTEM SET EVENTS ‘16453 trace name context forever, level 15’;
Key Switchover Operations and Alert Log Tags

Switchover is broken down into four main steps as follows.

  1. Convert to Standby - kill any existing production sessions, convert the control file into a standby control file, and send a message to the standby to continue the switchover.

    The Convert to Standby - these steps are found in the alert log of the original primary. All remaining steps are found in the original standby alert log.

  2. Cancel Recovery - apply remaining redo and stop recovery.

  3. Convert to Primary - a two-step close (to the mounted state) of instances (one instance, then all others), clear online redo logs, convert control file to primary control file, and data Guard Broker bookkeeping.

  4. Open New Primary - parallel open of all instances.

Table 12-9 Alert Log Tags Defining the Steps with Time Management Interface Event Enabled

Step Stage Time Management Interface Event Enabled
Convert To Standby(primary alert log) BEGIN TMI: dbsdrv switchover to target BEGIN <DATE> <TIMESTAMP>
Convert To Standby(primary alert log) END TMI: kcv_switchover_to_target send 'switchover to primary' msg BEGIN <DATE> <TIMESTAMP>
Cancel Recovery(standby alert log) BEGIN TMI: kcv_commit_to_so_to_primary wait for MRP to die BEGIN <DATE> <TIMESTAMP>
Cancel Recovery(standby alert log) END TMI: kcv_commit_to_so_to_primary wait for MRP to die END <DATE> <TIMESTAMP>
Convert to Primary (standby alert log) BEGIN TMI: kcv_commit_to_so_to_primary BEGIN CTSO to primary <DATE> <TIMESTAMP>
Convert to Primary (standby alert log) END TMI: adbdrv BEGIN 10 <DATE> <TIMESTAMP>
Open Primary(standby alert log) BEGIN TMI: adbdrv BEGIN 10 <DATE> <TIMESTAMP>
Open Primary(standby alert log) END TMI: adbdrv END 10 <DATE> <TIMESTAMP>
Key Failover Operations and Alert Log Tags

All failover steps are documented in the alert log of the target standby where the failover was performed.

  1. Cancel Recovery - Stop recovery and close all instances (to mounted) in parallel.

  2. Terminal Recovery - Archive standby redo logs and recover any unapplied redo.

  3. Convert to Primary - Clear online redo logs and convert control file to standby control file.

  4. Open Primary - Open all instances in parallel.

Table 12-10 Failover Alert Log Tags Defining the Steps with Time Management Interface Event Enabled

Step Stage Time Management Interface Event Enabled
Cancel Recovery BEGIN TMI: adbdrv termRecovery BEGIN <DATE> <TIMESTAMP>
Cancel Recovery END TMI: adbdrv termRecovery END <DATE> <TIMESTAMP>
Terminal Recovery BEGIN TMI: krdsmr full BEGIN Starting media recovery <DATE> <TIMESTAMP>
Terminal Recovery END TMI: krdemr full END end media recovery <DATE> <TIMESTAMP>
Convert to Primary BEGIN TMI: kcv_commit_to_so_to_primary BEGIN CTSO to primary <DATE> <TIMESTAMP>
Convert to Primary END TMI: adbdrv BEGIN 10 <DATE> <TIMESTAMP>
Open Primary BEGIN TMI: adbdrv BEGIN 10 <DATE> <TIMESTAMP>
Open Primary END TMI: adbdrv END 10 <DATE> <TIMESTAMP>