12 Tune and Troubleshoot Oracle Data Guard
When redo transport, redo apply, or role transitions are not meeting your expected requirements, use the following guidelines to help you tune and troubleshoot your deployment.
Overview of Oracle Data Guard Tuning and Troubleshooting
To get the best performance from your Oracle Data Guard configuration, use the following Oracle MAA best practices for monitoring, assessment, and performance tuning.
-
Ensure that Oracle Database and Oracle Data Guard configuration best practices are in place.
The assumption when assessing and tuning is that all of the Oracle Database and Data Guard configuration best practices are already integrated in the environment. Evaluate the adherence to those best practices before doing any tuning.
-
Assess and tune redo transport services
Oracle Data Guard automatically tunes redo transport to optimize performance. However, if you observe performance issues, you can monitor and tune redo transport services.
Asynchronous redo transport with Maximum Performance data protection mode is the default Oracle Data Guard configuration. Tuning asynchronous redo transport consists mainly of ensuring that the primary, standby, and network resources are sufficient for handling the workload, and that you monitor those resources for bottlenecks.
Synchronous redo transport does sacrifice some performance for zero data loss; however, using sound MAA recommended methods, you can monitor and assess the impact and distribute resources appropriately.
-
Assess and tune redo apply
In most cases, the default Oracle settings result in satisfactory performance for media recovery when the standby is always up to date. However, as applications and databases increase in size and throughput, media recovery operations can benefit from additional tuning to further optimize recovery time or redo apply throughput on a standby database
-
Assess and tune role transitions
With proper planning and implementation, Oracle Data Guard and Active Data Guard role transitions can effectively minimize downtime and ensure that the database environment is restored with minimal impact on the business. Performance tests using a physical standby database and Oracle Maximum Availability Architecture (MAA) best practices have shown that switchover and failover can be reduced to seconds.
Redo Transport Troubleshooting and Tuning
Oracle Data Guard redo transport performance is directly dependent on the performance of the primary and standby systems, the network that connects them, and the I/O subsystem.
Understanding the topology of the Data Guard configuration and its relevance to Data Guard performance helps eliminate infrastructure weaknesses that are often incorrectly attributed to the Data Guard architecture.
System and Network Performance Prerequisites
Oracle Data Guard architecture is very streamlined and efficient; however, like any application, there are reasonable system and network prerequisites to achieve satisfactory performance. After all of the configuration best practices have been implemented as described in Oracle Data Guard Configuration Best Practices, consider the following recommendations.
Primary Database System
-
Sufficient CPU utilization for LGWR to post foregrounds efficiently
-
Sufficient I/O bandwidth so local log writes maintain low I/O latency during peak rates
-
Network interfaces that can handle peak redo rate volumes combined with any other network activity across the same interface
-
Primary AWR, ASH, and OSwatcher data gathered for tuning and troubleshooting
Standby Database System
-
Sufficient CPU utilization for RFS (the Data Guard process that receives redo at the standby database) to efficiently write to standby redo logs
-
Sufficient I/O bandwidth to enable local log writes to maintain low I/O latency during peak rates
-
A network interface that can receive the peak redo rate volumes combined with any other network activity across the same interface
-
Standby AWR, ASH, and OSwatcher data should be gathered
Note:
The top problem encountered with the standby database is poor standby log write latency due to insufficient I/O bandwidth. This problem can be mitigated by using Data Guard Fast Sync.Network
-
For synchronous redo transport, the round trip network latency (RTT) between the primary database and target standby database or target far sync standby server should typical less than 2 ms to minimize primary database performance impact.
Oracle is often asked what the maximum latency is that can be supported or whether there is an equation that can be used to project performance impact. Unfortunately every application has a different tolerance to network latency. Differences in application concurrency, number of sessions, the transaction size in bytes, how often sessions commit, and log switch frequency can cause differences application performance impact.
-
Sufficient network bandwidth to support peak redo rates (steady state and when resolving gaps) combined with any other network activity that shares the same network.
Note that your point to point network bandwidth is throttled by the network segment, switch, router, or interface with the lowest network bandwidth. For example, if you have 10gigE for 90% of your network route and your existing switch or network interface only supports 1 GigE, then your maximum network bandwidth is 1 GigE.
-
Netstat and/or any network monitoring statistics should be gathered
Note:
The top network problems encountered are inconsistent network response and insufficient network bandwidth.Monitor System Resources
Monitor system resources on primary and standby hosts and the network using the utilities and statistics recommended here.
Monitoring CPU
-
When all of a system's CPU cores are occupied running work for processes, other processes must wait until a CPU core or thread becomes available, or the scheduler switches a CPU to run their code. A bottleneck can be created if too many processes are queued too often.
You can use the
uptime
,mpstat
,sar
,dstat
, andtop
utilities to monitor CPU usage. -
The commands
mpstat -P ALL
andsar -u -P ALL
display CPU usage statistics for each CPU core and an average across all CPU cores. -
The
%idle
value shows the percentage of time that a CPU is not running system code or process code. If the value of%idle
is near 0% most of the time on all CPU cores, the system is CPU-bound for the workload that it is running. The percentage of time spent running system code (%systemor %sys
) should not usually exceed 30%, especially if%idle
is close to 0%. -
The system load average represents the number of processes that are running on CPU cores, waiting to run, or waiting for disk I/O activity to complete, averaged over a period of time. On a busy system, the load average reported by
uptime
orsar -q
should not exceed two times the number of CPU cores. The system is overloaded if the load average exceeds four times the number of CPU cores for long periods. -
In addition to load averages (
ldavg-*
), thesar -q
command reports the number of processes currently waiting to run (the run-queue size,runq-sz
) and the total number of processes (plist_sz
). The value ofrunq-sz
also provides an indication of CPU saturation. -
Determine the system's average load under normal loads when users and applications do not experience problems with system responsiveness, and then look for deviations from this benchmark over time. A dramatic rise in the load average can indicate a serious performance problem.
Monitoring Memory Usage
-
When a system runs out of real or physical memory and starts using swap space, its performance deteriorates dramatically. If you run out of swap space, your programs or the entire operating system are likely to crash.
If
free
ortop
indicate that little swap space remains available, this is also an indication you are running low on memory. -
The
sar -r
command reports memory utilization statistics, including%memused
, which is the percentage of physical memory in use. -
The
sar -B
command reports memory paging statistics, includingpgscank/s
, which is the number of memory pages scanned by thekswapd
daemon per second, andpgscand/s
, which is the number of memory pages scanned directly per second. -
The
sar -W
command reports swapping statistics, includingpswpin/s
andpswpout/s
, which are the numbers of pages per second swapped in and out per second. -
The system has a memory shortage if
%memused
is near 100% and the scan rate is continuously over 200 pages per second. -
The output from the
dmesg
command might include notification of any problems with physical memory that were detected at boot time.
Monitoring I/O
-
Use
iostat
,sar
, andiotop
utilities to monitor I/O. -
The
iostat
command monitors the loading of block I/O devices by observing the time that the devices are active relative to the average data transfer rates. You can use this information to adjust the system configuration to balance the I/O loading across disks and host adapters.iostat -x
reports extended statistics about block I/O activity at one second intervals, including%util
, which is the percentage of CPU time spent handling I/O requests to a device, andavgqu-sz
, which is the average queue length of I/O requests that were issued to that device.If
%util
approaches 100% oravgqu-sz
is greater than 1, device saturation is occurring and the storage I/O bandwidth needs to be augmented by adding disks or storage. -
You can also use the
sar -d
command to report on block I/O activity, including values for%util
andavgqu-sz
. -
The
iotop
utility can help you identify which processes are responsible for excessive disk I/O. Theiotop
user interface is similar totop
.In its upper section,
iotop
displays the total disk input and output usage in bytes per second. In its lower section, iotop displays I/O information for each process, including disk input-output usage in bytes per second, the percentage of time spent swapping in pages from disk or waiting on I/O, and the command name.Use the left and right arrow keys to change the sort field, and press A to toggle the I/O units between bytes per second and total number of bytes, or O to toggle between displaying all processes or only those processes that are performing I/O.
Monitoring the Network
-
Use
ip
,ss
, andsar
to monitor the network. -
The
ip -s
link command displays network statistics and errors for all network devices, including the numbers of bytes transmitted (TX
) and received (RX
). The dropped and overrun fields provide an indicator of network interface saturation. -
The
ss -s
command displays summary statistics for each protocol. -
Use the
sar –n DEV
command to monitor the current rate transmitted using an interface.
Assess Database Wait Events
Once you verify that the system or network resources are not bottlenecked, you can assess database wait events for performance flags related to your Oracle Data Guard configuration.
On the primary database, assess database wait events with AWR reports, and on the standby
database you can use standby AWR for Oracle Database 18c and later, or
statspack
reports for older Oracle Database releases.
Table 12-1 Wait Events Relevant to Oracle Data Guard
Event Name | Description |
---|---|
ARCH Remote Write | The time (in centi-seconds) that ARCn background processes spend blocked waiting for network write operations to complete |
ASYNC Remote Write | The time (in centi-seconds) for asynchronous streaming RFSWRITE
operations
This includes stall reaps and streaming network submission time. This time is accumulated by TTnn (Redo Transport Slave) background processes. |
Redo Transport Attach | The time spent (in centi-seconds) doing Connect, Logon, and RFSATTACH for any network process |
Redo Transport Close | The time spent (in centi-seconds) by ARCn, NSSn, and TTnn processes doing RFSCLOSE and RFSRGSTR operations |
Redo Transport Detach | The time spent (in centi-seconds) doing RFSDETACH and Disconnect for any network process |
Redo Transport Open | The time spent (in centi-seconds) by ARCn, NSSn, and TTnn background processes doing RFSCREAT and RFSANNCE operations |
Redo Transport Ping | The time spent (in centi-seconds) by ARCn background processes doing RFSPING operations |
Redo Transport Slave Shutdown | The time spent (in centi-seconds) by LGWR doing NSSn and TTnn process shutdown and termination |
Redo Transport Slave Startup | The time spent (in centi-seconds) by LGWR doing NSSn and TTnn process startup and initialization |
Redo Writer Remote Sync Complete | The time spent (in centi-seconds) by LGWR reaping completed network writes to remote destinations |
Redo Writer Remote Sync Notify | The time spent (in centi-seconds) by LGWR issuing network writes to remote destinations |
Remote SYNC Ping | The time spent (in centi-seconds) by the LGWR and NSSn background processes doing synchronous PING operations |
SYNC Remote Write | The time spent by LGWR doing SYNC RFSWRITE operations |
Wait events specific to Oracle Data Guard with Oracle Database 11.2 are described in the table below.
Table 12-2 Wait Events Relevant to Oracle Data Guard 11.2
Event Name | Description |
---|---|
ARCH wait on ATTACH | Monitors the amount of time spent by all archiver processes to spawn a new RFS connection |
ARCH wait on SENDREQ | Monitors the amount of time spent by all archiver processes to write archive logs to the local disk as well as write them remotely |
ARCH wait on DETACH | Monitors the amount of time spent by all archiver processes to delete an RFS connection |
LNS wait on ATTACH | Monitors the amount of time spent by all LNS processes to spawn a new RFS connection |
LNS wait on SENDREQ | Monitors the amount of time spent by all LNS processes to write the received redo to disk as well as open and close the remote archived redo logs |
LNS wait on DETACH | Monitors the amount of time spent by all LNS processes to delete an RFS connection |
LGWR wait on LNS | Monitors the amount of time spent by the log writer (LGWR) process waiting to receive messages from LNS processes |
LNS wait on LGWR | Monitors the amount of time spent by LNS processes waiting to receive messages from the log writer (LGWR) process |
LGWR-LNS wait on channel | Monitors the amount of time spent by the log writer (LGWR) process or the LNS processes waiting to receive messages |
See Installing and Using Standby Statspack (Doc ID 454848.1) for information about Standby Statspack.
Assess Synchronous Redo Transport
The following topics describe how to assess redo transport, specifically for synchronous redo transport.
Understanding How Synchronous Transport Ensures Data Integrity
The following algorithms ensure data consistency in an Oracle Data Guard synchronous configuration.
-
LGWR redo write on the primary database and the Data Guard NSS network redo write are identical.
-
The Data Guard Managed Recovery Process (MRP) at the standby database cannot apply redo unless the redo has been written to the primary's online redo log, with the only exception being during a Data Guard failover operation (primary is gone).
In addition to shipping redo synchronously, NSS and LGWR exchange information regarding the safe redo block boundary that standby recovery can apply up to from its standby redo logs. This prevents the standby from applying redo it may have received, but which the primary has not yet acknowledged as committed to its own online redo logs.
The possible failure scenarios include:
-
If primary database's LGWR cannot write to online redo log, then LGWR and instance will crash. Instance or crash recovery will recover to the last committed transaction in the online redo log and roll back any uncommitted transactions. The current log will be completed and archived.
-
On the standby, the partial standby redo log completes with the correct value for the size to match the corresponding online redo log. If any redo blocks are missing from the SRL, those are shipped over (without reshipping the entire redo log).
-
If the primary database crashes resulting in an automatic or manual zero data loss failover, then part of the Data Guard failover operation will do "terminal recovery" and read and recover the current standby redo log.
Once recovery finishes applying all of the redo in the standby redo logs, the new primary database comes up and archives the newly completed log group. All new and existing standby databases discard any redo in the online redo logs, flashback to consistent SCN, and only apply the archives coming from the new primary database. Once again the Data Guard environment is in sync with the (new) primary database.
Assessing Performance in a Synchronous Redo Transport Environment
When assessing performance in a synchronous redo transport environment it is important that you know how the different wait events relate to each other. The impact of enabling synchronous redo transport varies between applications.
To understand why, consider the following description of work the log writer process (LGWR) performs when a commit is issued.
-
Foreground process posts LGWR for commit ("log file sync" starts). If there are concurrent commit requests queued, LGWR will batch all outstanding commit requests together resulting in a continuous strand of redo.
-
LGWR waits for CPU.
-
LGWR starts redo write ("redo write time" starts).
-
For Oracle RAC, LGWR broadcasts the current write to other instances.
-
After preprocessing, if there is a
SYNC
standby, LGWR starts the remote write (“SYNC remote write” starts). -
LGWR issues local write ("log file parallel write").
-
If there is a
SYNC
standby, LGWR waits for the remote write to complete. -
After checking the I/O status, LGWR ends "redo write time / SYNC remote write".
-
For Oracle RAC, LGWR waits for the broadcast
ack
. -
LGWR updates the on-disk SCN.
-
LGWR posts the foregrounds.
-
Foregrounds wait for CPU.
-
Foregrounds ends "log file sync".
Use the following approaches to assess performance.
-
For batch loads the most important factor is to monitor the elapsed time, because most of these processes must be completed in a fixed period of time. The database workloads for these operations are very different than the normal OLTP workloads. For example, the size of the writes can be significantly larger, so using log file sync averages does not give you an accurate view or comparison.
-
For OLTP workloads, monitor the volume of transactions per second (from AWR) and the redo rate (redo size per second) from the AWR report. This information gives you a clear picture of the application throughput and how it is impacted by enabling synchronous redo transport.
Why the log file sync Wait Event is Misleading
Typically, the log file sync
wait event on the primary
database is the first place administrators look when they want to assess the impact of
enabling synchronous redo transport.
If the average log file sync
wait before enabling synchronous redo
transport (SYNC
) was 3ms, and after was 6ms, then the assumption is
that SYNC
impacted performance by one hundred percent. Oracle does not
recommend using log file sync
wait times to measure the impact of
SYNC
because the averages can be very deceiving, and the actual
impact of SYNC
on response time and throughput may be much lower than
the event indicates.
When a user session commits, LGWR will go through the process of getting on the CPU,
submitting the I/O, waiting for the I/O to complete, and then getting back on the CPU to
post foreground processes that their commit has completed. This whole time period is
covered by the log file sync
wait event. While LGWR is performing its
work there are, in most cases, other sessions committing that must wait for LGWR to
finish before processing their commits. The size and number of sessions waiting are
determined by how many sessions an application has and how frequently those sessions
commit. This batching up of commits is generally referred to as application
concurrency.
For example, assume that it normally takes 0.5ms to perform log writes (log file
parallel write
), 1ms to service commits (log file sync
),
and on average you are servicing 100 sessions for each commit. If there was an anomaly
in the storage tier, and the log write I/O for one commit took 20ms to complete, then
you could have up to 2,000 sessions waiting on log file sync
, while
there would only be 1 long wait attributed to log file parallel write
.
Having a large number of sessions waiting on one long outlier can greatly skew the
log file sync
averages.
The output from V$EVENT_HISTOGRAM
for the log file sync
wait event for a particular period in time is shown in the following table.
Table 12-3 V$EVENT_HISTOGRAM Output for the Log File Sync Wait Event
Milliseconds | Number of Waits | Percent of Total Waits |
---|---|---|
1 | 17610 | 21.83% |
2 | 43670 | 54.14% |
4 | 8394 | 10.41% |
8 | 4072 | 5.05% |
16 | 4344 | 5.39% |
32 | 2109 | 2.61% |
64 | 460 | 0.57% |
128 | 6 | 0.01% |
The output shows that 92% of the log file sync
wait times are less than
8ms, with the vast majority less than 4ms (86%). Waits over 8ms are outliers and only
make up 8% of wait times overall, but because of the number of sessions waiting on those
outliers (because of batching of commits) the averages get skewed. The skewed averages
are misleading when log file sync
average waits times are used as a
metric for assessing the impact of SYNC
.
Understanding What Causes Outliers
Any disruption to the I/O on the primary or standby, or spikes in network latency, can cause high log file sync outliers with synchronous redo transport. You can see this effect when the standby system's I/O subsystem is inferior to that of the primary system.
Often administrators host multiple databases such as development and test on standby
systems, which can impair I/O response. It is important to monitor I/O using iostat as
described in Monitoring I/O to determine if the disks reach maximum IOPS, because this
affects the performance of SYNC
writes.
Frequent log switches are significant cause of outliers. Consider what occurs on the standby when a log switch on the primary occurs, as follows.
-
RFS on the standby must finish updates to the standby redo log header.
-
RFS then switches into a new standby redo log with additional header updates.
-
Switching logs forces a full checkpoint on the standby. This causes all dirty buffers in the buffer cache to be written to disk, causing a spike in write I/O. In a non-symmetric configuration where the standby storage subsystem does not have the same performance as the primary database, this results in higher I/O latencies.
-
The previous standby redo log must be archived, increasing both read and write I/O.
Effects of Synchronous Redo Transport Remote Writes
When you enable synchronous redo transport (SYNC
), you
introduce a remote write (RFS write to a standby redo log) in addition to the normal local
write for commit processing.
This remote write, depending on network latency and remote I/O bandwidth, can make commit processing time increase. Because commit processing takes longer, you observe more sessions waiting on LGWR to finish its work and begin work on the commit request, that is, application concurrency has increased. You can observe increased application concurrency by analyzing database statistics and wait events. Consider the example in the following table.
Table 12-4 Affect of Sync Transport Increasing Application Concurrency
SYNC | Redo Rate | Network Latency | TPS from AWR | log file sync average (ms) | log file parallel write average (ms) | RFS random I/O | SYNC remote write average (ms) | Redo write size (KB) | Redo writes |
---|---|---|---|---|---|---|---|---|---|
Defer | 25MB | 0 | 5,514.94 | 0.74 | 0.47 | NA | NA | 10.58 | 2,246,356 |
Yes | 25MB | 0 | 5,280.20 | 2.6 | .51 | .65 | .95 | 20.50 | 989,791 |
Impact | 0 | - | -4% | +251% | +8.5% | NA | NA | +93.8% | -55.9% |
In the above example, enabling SYNC
reduced the number of redo writes,
but increased the size of each redo write. Because the size of the redo write increased,
you can expect the time spent doing the I/O (both local and remote) to increase. The
log file sync
wait time is higher because there is more work per
wait.
However, at the application level, the impact on the transaction rate or the transaction
response time might change very little as more sessions are serviced per commit. This is
why it is important to measure the impact of SYNC
at the application
level, and not depend entirely on database wait events. It is also a perfect example of
why the log file sync
wait event is a misleading indicator of the
actual impact SYNC
has on the application.
Example of Synchronous Redo Transport Performance Troubleshooting
To look at synchronous redo transport performance, calculate the time spent for local redo writes latency, average redo write size per write, and overall redo write latency, as shown here.
Use the following wait events to do the calculations.
-
local redo write latency = 'log file parallel write'
-
remote write latency = ‘SYNC remote write’
-
average redo write size per write = ‘redo size’ / ‘redo writes’
-
average commit latency seen by foregrounds = 'log file sync'
Statistics from an AWR report on an Oracle database are provided in the following table.
Synchronous redo transport (SYNC
) was enabled to a local standby with a
1ms network latency to compare the performance impact to a baseline with
SYNC
disabled.
Table 12-5 Assessing Synchronous Redo Transport Performance with Oracle Database
Metric | Baseline (No SYNC) | SYNC | Impact |
---|---|---|---|
redo rate (MB/s) | 25 | 25 | no change |
log file sync | 0.68 | 4.60 | +576% |
log file parallel write average (ms) | 0.57 | 0.62 | +8.8% |
TPS | 7,814.92 | 6224.03 | -20.3% |
RFS random I/O | NA | 2.89 | NA |
SYNC remote write average (ms) | NA | 3.45 | NA |
redo writes | 2,312,366 | 897,751 | -61,2% |
redo write size (KB) | 10.58 | 20.50 | +93.8% |
In the above example observe that log file sync
waits averages increased
dramatically after enabling SYNC
. While the local writes remained
fairly constant, the biggest factor in increasing log file sync
was the
addition of the SYNC
remote write. Of the SYNC
remote
write the network latency is zero, so focusing on the remote write into the standby redo
log shows an average time of 2.89ms. This is an immediate red flag given that the
primary and standby were using the same hardware, and the SYNC
remote
write average time should be similar to the primary's log file parallel
write
average time.
In the above example the standby redo logs have multiple members, and they are placed in a slower performing disk group. After reducing the standby redo logs to a single member and placing them in a fast disk group the you can see results such as those shown in the following table.
Table 12-6 SYNC Performance After Reducing Standby Redo Logs to a Single Member and Placing on a Fast Disk Group
Metric | Baseline (No SYNC) | SYNC | Impact |
---|---|---|---|
redo rate (MB/s) | 25 | 25 | no change |
log file sync | 0.67 | 1.60 | +139% |
log file parallel write | 0.51 | 0.63 | +23.5% |
TPS | 7714.36 | 7458.08 | -3.3% |
RFS random I/O | NA | .89 | NA |
SYNC remote write average (ms) | NA | 1.45 | NA |
redo writes | 2,364,388 | 996,532 | -57.9% |
redo write size (KB) | 10.61 | 20.32 | +91.5% |
Redo Apply Troubleshooting and Tuning
Redo apply performance depends mainly on the type of workload that is being recovered and the system resources allocated to recovery.
Recovering an OLTP workload can be very I/O intensive because there are a large number of small random reads and writes. The higher the number of IOPS (I/O per second) that a storage subsystem can handle, the faster the recovery rate can be.
In contrast, recovering batch workloads is more efficient because it consists of large sequential reads and writes, resulting in much faster recovery rates than OLTP workloads running on equivalent system resources. In addition, batch direct load operation recovery optimizations result in greater efficiency and even higher recovery rates.
The difference between OLTP and batch recovery performance profiles explains why applications with variation in their mixtures of OLTP and batch workloads can have different recovery rates at a standby database, even if the primary database redo generation rates are similar.
As changes occur on the primary database, redo is generated and sent to the standby database. The frequency of shipping redo to the standby is determined by whether the remote destination is using synchronous or asynchronous redo transport.
If redo apply was started using real-time apply, redo generated by the primary database is applied to the standby database as soon as it is received (that is, there is no wait for the database to switch logs). An Oracle Active Data Guard standby database that is open read-only while recovery is active enables users to query current data.
If your standby database is lagging relative to the primary database, you should focus on the following main areas:
-
Determining the transport lag
-
Determining the apply lag
-
Determining the query SCN or time
-
Determining how far behind the standby data is compared to the primary database
Monitor Apply Lag
For standby databases on symmetric hardware and configuration, the apply lag should less than 10 seconds.
If you observe high redo transport lag, tune the network transport first and address any insufficient network bandwidth or any overhead resulting from redo encryption or compression.
To monitor apply lag on the standby database query the V$DATAGUARD_STATS
view.
SQL> SELECT name,value,time_computed,datum_time FROM v$dataguard_stats WHERE name=’%lag’;
The DATUM_TIME
column is the local time on the standby database when the
datum used to compute the metric was received. The lag metrics are computed based on
data that is periodically received from the primary database. An unchanging value in
this column across multiple queries indicates that the standby database is not receiving
data from the primary database. The potential data loss in this scenario would be from
the last datum time from V$DATAGUARD_STATS
to the current time on the
standby.
To obtain a histogram that shows the history of apply lag values since the standby
instance was last started, query the V$STANDBY_EVENT_HISTOGRAM
view.
SQL> SELECT * FROM v$standby_event_histogram WHERE name like '%lag' and count >0;
To evaluate the apply lag over a period of time, take a snapshot of
V$STANDBY_EVENT_HISTOGRAM
at the beginning of the time period and
compare that snapshot with one taken at the end of the time period.
SQL> col NAME format a10
SQL> SELECT NAME,TIME,UNIT,COUNT,LAST_TIME_UPDATED FROM V$STANDBY_EVENT_HISTOGRAM WHERE name like '%lag' and count >0 ORDER BY LAST_TIME_UPDATED;
NAME TIME UNIT COUNT LAST_TIME_UPDATED
---------- ---------- ---------------- ---------- --------------------
apply lag 41 seconds 3 04/05/2018 16:30:59
apply lag 44 seconds 1 04/05/2018 16:31:02
apply lag 45 seconds 2 04/05/2018 16:31:03
apply lag 46 seconds 2 04/05/2018 16:31:04
Evaluate Redo Apply Rate if the Apply Lag Is High
In an Oracle Data Guard physical standby environment, it is important to determine if the standby database can recover redo as fast as, or faster than, the primary database can generate redo.
If the apply lag is above expectations, then evaluate redo apply performance by querying
the V$RECOVERY_PROGRESS
view. This view contains the columns described
in the following table.
Table 12-7 V$RECOVERY_PROGRESS View Columns
Column | Description |
---|---|
Average Apply Rate | Redo Applied / Elapsed Time includes time spent actively applying redo and time spent waiting for redo to arrive. |
Active Apply Rate | Redo Applied / Active Time is a moving average over the last 3 minutes. The rate does not include time spent waiting for redo to arrive. |
Maximum Apply Rate | Redo Applied / Active Time is peak measured throughput or maximum rate achieved over a moving average over last 3 minutes. The rate does not include time spent waiting for redo to arrive. |
Redo Applied | Represents the total amount of data in bytes that has been applied. |
Last Applied Redo | SCN and Timestamp of last redo applied. This is the time as stored in the redo stream, so it can be used to compare where the standby database is relative to the primary. |
Apply Time per Log | Average time spent actively applying redo in a log file. |
Checkpoint Time per Log | Average time spent for a log boundary checkpoint. |
Active Time | Represents the total duration applying the redo , but not waiting for redo |
Elapsed Time | Represents the total duration applying the redo , including waiting for redo |
Standby Apply Lag | Represents Redo Apply has not applied for N seconds, possible standby is behind the primary. |
Log Files | Represents Number of Log files applied so far. |
The most useful statistic is the Active Apply rate because the Average Apply Rate includes idle time spent waiting for redo to arrive making it less indicative of apply performance.
The simplest way to determine application throughput in terms of redo volume is to
collect Automatic Workload Repository (AWR) reports on the primary database during
normal and peak workloads, and determine the number of bytes per second of redo data the
production database is producing. You can then compare the speed at which redo is being
generated with the Active Apply Rate columns in the V$RECOVERY_PROGRESS
view to determine if the standby database is able to maintain the pace.
Tune Redo Apply by Evaluating Database Wait Events
Once you have verified that you are not bottlenecked on any system or network resources you are ready to assess database wait events. On the primary database this is done using AWR reports while on the standby database you will use standby AWR.
Before assessing database wait events, it is important that you understand the process flow involved in recovery. In general there are three distinct phases to standby recovery; the log read phase, the apply phase, and the checkpoint phase.
-
Redo is received on the standby by the RFS (Remote File Server) process.
The RFS process writes newly received redo for each thread into the current standby redo log for that thread. The RFS write operation is tracked by the
rfs random I/O
wait event. -
Once redo has been written the recovery coordinator process (pr00) will read the redo from the standby redo logs for each thread.
This read I/O is tracked by the
log file sequential read
operation. The recovery coordinator then merges redo from all threads together and place the redo into memory buffers for the recovery slaves. The wait events for writing and reading into recovery memory buffers is tracked by theparallel recovery read buffer free
andparallel recovery change buffer free
wait events. -
The recovery processes retrieve redo or change vectors from the memory buffers and begin the process applying the changes to data blocks.
First the recovery slaves determine which data blocks need to be recovered and reads those into the buffer cache if it’s not already present. This read I/O by the recovery slaves is tracked by the
recovery read
wait event. -
When a log is switched on the primary for any thread the standby will coordinate a switch of the standby redo log for that thread at the same time.
A log switch on a standby will force a full checkpoint which will result in flushing all dirty buffers from the buffer cache out to the data files on the standby. A checkpoint on the standby is currently more expensive than on a primary. Multiple DB writer processes (DBWR) will write the data file blocks down to the data files with its write time tracked by the
db file parallel write
wait event. The total time for the checkpoint to complete is covered by thecheckpoint complete
wait event.
During the apply phase it is normal to observe that the recovery coordinator process (pr00) has high utilization on a single CPU, while during the checkpoint phase you normally see an increase in the write I/O to the data files.
The following table provides a description as well as tuning advice for wait events involved in the recovery process.
Table 12-8 Recovery Process Wait Events
Column | Description | Tuning Recommendations |
---|---|---|
Logfile sequential read | The parallel recovery coordinator is waiting on I/O from the online redo log or the archived redo log. | Tune or increase the I/O bandwidth for the ASM disk group where the archive logs or online redo logs reside. |
Parallel recovery read buffer free | This event indicates that all read buffers are being used by slaves, and usually indicates that the recovery slaves lag behind the coordinator. | Increase _log_read_buffers to max 256
|
Parallel recovery change buffer free | Redo Applied / Active Time is peak measured throughput or maximum rate achieved over a moving average over last 3 minutes. The rate does not include time spent waiting for redo to arrive. | Tune or increase the I/O bandwidth for the ASM disk group where data files reside. |
Data file init write | The parallel recovery coordinator is waiting for a buffer to be released by a recovery slave. Again, this is a sign the recovery slaves are behind the coordinator. | Tune or increase the I/O bandwidth for the ASM disk group where data files reside. |
Parallel recovery control message reply | The parallel recovery coordinator is waiting for a file resize to finish, as would occur with file auto extend. | This is a non-tunable event. |
Parallel recovery slave next change | The parallel recovery slave is waiting for a change to be shipped from the coordinator. This is in essence an idle event for the recovery slave. To determine the amount of CPU a recovery slave is using, divide the time spent in this event by the number of slaves started and subtract that value from the total elapsed time. This may be close, because there are some waits involved. | N/A. This is an idle event. |
DB File Sequential Read | A parallel recovery slave (or serial recovery process) is waiting for a batch of synchronous data block reads to complete. | Tune or increase the I/O bandwidth for the ASM disk group where data files reside. |
Checkpoint completed | Recovery is waiting for checkpointing to complete, and Redo Apply is not applying any changes currently. |
Tune or increase the I/O bandwidth for the ASM disk group where data files reside. Also, increase the number of |
Recovery read | A parallel recovery slave is waiting for a batched data block I/O. | Tune or increase the I/O bandwidth for the ASM disk group where data files reside. |
Parallel recovery change buffer free (MIRA) | The parallel recovery coordinator is waiting for a change mapping buffer to be released by one of the recovery slaves. | Increase _change_vector_buffers to 2 or 4
|
Recovery apply pending and/or recovery receive buffer free (MIRA) |
Recovery apply pending: the time the Recovery receive buffer free: the time (in centiseconds) spent by the receiver process on instance waiting for apply slaves to apply changes from received buffers so that they can be freed for the next change. |
Increase Note that these parameters use space from the shared pool equal to the sum of their values (in MB) multiplied by the number of apply instances. |
See How to Generate AWRs in Active Data Guard Standby Databases (Doc ID 2409808.1) for more information about generating AWRs on the standby database.
Enable Multi-Instance Redo Apply if Required
Under extremely heavy workloads, a single instance redo apply is not sufficient.
If there are insufficient CPU resources on the standby redo apply database server, or if the Recovery Coordinator process (pr00) is CPU bound (by checking OS utilities), then multi-instance redo apply will allow you to scale across Oracle RAC instances.
Multi-instance redo apply greatly improves the scalability of redo apply for Oracle RAC databases. Rather than merging all threads of redo into a single apply process, multiple apply instances divide the threads of redo between them. For example, when two apply nodes are used to apply four threads of redo from the primary, each apply instance will apply two threads.
When the workload is well balanced between all threads of the primary, the scalability of multi-instance redo apply is predictable (2x, 4x and so on). Unbalanced workloads, where one instance does more work than another, get mixed scaling results compared to single instance redo apply.
Redo Apply Performance Tuning Example
The following is an example which applies the redo apply tuning discussed in this document.
The first step to assess performance and determine steps to improve performance is to
take standby statspack
snaps that capture statistics on the standby
database.
After generating a standby statspack
report based off of those snaps the
next step is to examine the load profile section.
Load Profile Total Per Second
~~~~~~~~~~~~ ------------------ -----------------
DB time(s): 9.2 0.0
DB CPU(s): 0.7 0.0
Redo MB applied: 14,497.8 9.7
Logical reads: 471.0 0.3
Physical reads: 6,869,213.0 4,570.3
Physical writes: 9,722,929.0 6,469.0
User calls: 510.0 0.3
Parses: 1,192.0 0.8
Hard parses: 3.0 0.0
W/A MB processed: 35.9 0.0
Logons: 45.0 0.0
Executes: 1,410.0 0.9
Rollbacks: 0.0 0.0
From the load profile shown above, you can see the amount of redo recovered during the report period and the amount of redo recovered on a per second basis. In addition you can see the number of physical reads and writes being performed.
Compare the number of reads and writes to the baseline reports that showed acceptable performance. Increased reads could be coming from read-only queries that might not have been seen previously, while increased writes could indicate a change in the type of workload being recovered.
The top five wait events section is the most important section to review. This section details where the majority of the time is spent waiting, as shown below.
Top 5 Timed Events Avg %Total
~~~~~~~~~~~~~~~~~~ wait Call
Event Waits Time (s) (ms) Time
----------------------------------------- ------------ ----------- ------ ------
checkpoint completed 4,261 8,210 1927 42.1
db file parallel write 20,982 2,658 127 13.6
lreg timer 501 1,502 2998 7.7
free buffer waits 119,108 1,278 11 6.5
parallel recovery read buffer free 8,046 1,101 137 5.6
In the output above the 42% call time is spent waiting for checkpoint
completed
, with the second wait event db file parallel
write
being associated with the checkpoint completed
wait
as it relates to DBWR performing writes from the buffer cache. The free buffer
waits
wait event indicates that the buffer cache is full and recovery
slaves reading buffers into the buffer cache are stalled.
This wait event profile indicates that you should increase the number of DBWR processes to help increase the rate that buffers can be flushed from the buffer cache.
Before making any changes, consult the statspack
report to make note of
the recovery active apply rate
and the time being spent in the apply
phase versus the checkpoint phase. After you adjust the number of DBWR processes,
compare the subsequent standby statspack
reports to these numbers to
see improvement.
Recovery Start Time Item Sofar Units Redo Timestamp
------------------- ----------------- -------------- ------- ------------------
21-Oct-15 08:03:56 Log Files 6 Files
21-Oct-15 08:03:56 Active Apply Rate 10,809 KB/sec
21-Oct-15 08:03:56 Average Apply Rat 10,708 KB/sec
21-Oct-15 08:03:56 Maximum Apply Rat 80,592 KB/sec
21-Oct-15 08:03:56 Redo Applied 15,111 Megabyt
21-Oct-15 08:03:56 Last Applied Redo 0 SCN+Tim 20-Oct-15 12:48:25
21-Oct-15 08:03:56 Active Time 1,408 Seconds
21-Oct-15 08:03:56 Apply Time per Lo 121 Seconds
21-Oct-15 08:03:56 Checkpoint Time p 14 Seconds
21-Oct-15 08:03:56 Elapsed Time 1,445 Seconds
21-Oct-15 08:03:56 Standby Apply Lag 70,785 Seconds
It’s important that you make only one change at a time, and obtain new standby
statspack
reports to assess any improvement based on that change.
For each change follow the same methodology of assessing the physical reads and writes,
the top wait events, and the time spent in the different recovery phases.
Role Transition Assessment and Tuning
With thorough planning, configuration, and tuning, Oracle Data Guard role transitions can effectively minimize downtime and ensure that the database environment is restored with minimal impact on the business.
Using a physical standby database, Oracle MAA testing has determined that switchover and failover times with Oracle Data Guard have been reduced to seconds. The following topics describe best practices for both switchover and failover. While following best practices, switchover times of approximately 34 seconds for Oracle RAC and 19 seconds for a single instance database have been observed.
Validate Database Switchover and Failover Readiness
The Oracle Data Guard broker VALIDATE DATABASE
command
gathers information related to switchover and failover readiness.
The validation verifies that the standby and primary database are reachable and the apply
lag is less than ApplyLagThreshold
for the target database. If these
data points are favorable, the command output displays Ready for Failover:
Yes
as shown below. In addition, if redo transport is running, the command
output displays Ready for Switchover: Yes
.
DGMGRL> validate database [verbose] database_name
Database Role: Physical standby database
Primary Database: standby db_unique_name
Ready for Switchover: Yes
Ready for Failover: Yes (Primary Running)
VALIDATE DATABASE
checks additional information that can impact
switchover time and database performance, such as whether the online redo logs have been
cleared, number of temporary tablespaces, parameter mismatches between primary and
standby, and the status of flashback databases.
In most failover cases the primary database has crashed or become unavailable. The
Ready for Failover
output indicates if the primary database is
running when VALIDATE DATABASE
was issued. This state does not prevent
a failover, but it is recommended that you stop the primary database before issuing a
failover to avoid a split-brain scenario where the configuration has two primary
databases. The broker only guarantees split-brain avoidance on failover when Fast-Start
Failover is used.
You can also run VALIDATE DATABASE
periodically as a configuration
monitoring tool.
Use the Broker to Initiate Switchover and Failover
Use the Oracle Data Guard broker SWITCHOVER command to initiate switchover, and the FAILOVER command to initiate failover.
As part of a switchover or failover operation the broker does the following.
- Configures redo transport from the new primary database
- Starts redo apply on the new standby database
- Ensures that other standby databases in the broker configuration are viable and receiving redo from the new primary
- Integrates Oracle Clusterware and Global Data Services to ensure the correct services are started after role change
To configure broker to initiate switchover, run
DGMGRL> SWITCHOVER TO database_name;
To configure broker to initiate failover, run
DGMGRL> FAILOVER TO database_name [IMMEDIATE];
By default FAILOVER
applies all redo that was received before failing
over. The IMMEDIATE
clause skips the pending redo and fails over
immediately.
The SWITCHOVER
and FAILOVER
commands are idempotent and
can be re-issued in the unlikely event of a failed transition.
Optimize Failover Processing
Implement the following best practices to optimize failover processing.
-
Enable Flashback Database to reinstate the failed primary databases after a failover operation has completed. Flashback Database facilitates fast point-in-time recovery from failures caused by user error or logical corruption.
-
Enable real-time apply, which allows apply services to apply the redo on the standby databases as soon as it is received.
-
Consider configuring multiple standby databases to maintain data protection following a failover.
-
Use Oracle Managed Files to pre-clear and pre-create online redo logs on the standby database.
As part of a failover, the standby database must clear its online redo logs before opening as the primary database. The time needed to complete this I/O can add significantly to the overall failover time. With Oracle Managed Files, the standby pre-creates the online redo logs the first time the managed redo process (MRP) is started.
-
Alternatively set the
LOG_FILE_NAME_CONVERT
parameter. You can also pre-create empty online redo logs by issuing the SQL*PlusALTER DATABASE CLEAR LOGFILE
statement on the standby database.
-
-
Use fast-start failover. If possible, to optimize switchover processing, ensure that the databases are synchronized before the switchover operation. Real-time apply ensures that redo is applied as received and ensures the fastest switchover.
Enable Fast-Start Failover and Leverage Best Practices
Fast-start failover automatically, quickly, and reliably fails over to a designated standby database if the primary database fails, without requiring manual intervention to execute the failover. You can use fast-start failover only in an Oracle Data Guard configuration that is managed by Oracle Data Guard broker.
The Data Guard configuration can be running in either the maximum availability or maximum
performance mode with fast-start failover. When fast-start failover is enabled, the
broker ensures fast-start failover is possible only when the configured data loss
guarantee can be upheld. Maximum availability mode provides an automatic failover
environment guaranteed to lose no data. Maximum performance mode provides an automatic
failover environment guaranteed to lose no more than the amount of data (in seconds)
specified by the FastStartFailoverLagLimit
configuration property.
Adopt fast-start failover best practices discussed in Configure Fast Start Failover.
Assess Time Management Interface Event Alerts to Troubleshoot Role Transition Timings
The Time Management Interface (TMI) event is a low overhead event which adds a line to the alert log whenever certain calls are executed in Oracle.
These entries in the alert log, or tags, delineate the beginning and end of a call. The tables in the topics below depict the delineation of key switchover and failover operations. This method is the most accurate for determining where time is being spent.
Set the database level event 16453 trace name context forever, level 15 on all databases.
There are two methods of enabling this trace, either using the EVENT
database parameter or setting the EVENTS
at the system level. The
difference is that the EVENT
parameter is not dynamic but is persistent
across restarts. set EVENTS
is dynamic but NOT persistent across
database restarts. See the following examples.
ALTER SYSTEM SET EVENT=‘16453 trace name contextforever, level 15’ scope=spfile sid=’*’
ALTER SYSTEM SET EVENTS ‘16453 trace name context forever, level 15’;
Key Switchover Operations and Alert Log Tags
Switchover is broken down into four main steps as follows.
-
Convert to Standby
- kill any existing production sessions, convert the control file into a standby control file, and send a message to the standby to continue the switchover.The
Convert to Standby
- these steps are found in the alert log of the original primary. All remaining steps are found in the original standby alert log. -
Cancel Recovery
- apply remaining redo and stop recovery. -
Convert to Primary
- a two-step close (to the mounted state) of instances (one instance, then all others), clear online redo logs, convert control file to primary control file, and data Guard Broker bookkeeping. -
Open New Primary
- parallel open of all instances.
Table 12-9 Alert Log Tags Defining the Steps with Time Management Interface Event Enabled
Step | Stage | Time Management Interface Event Enabled |
---|---|---|
Convert To Standby(primary alert log) | BEGIN | TMI: dbsdrv switchover to target BEGIN <DATE> <TIMESTAMP> |
Convert To Standby(primary alert log) | END | TMI: kcv_switchover_to_target send 'switchover to primary' msg BEGIN <DATE> <TIMESTAMP> |
Cancel Recovery(standby alert log) | BEGIN | TMI: kcv_commit_to_so_to_primary wait for MRP to die BEGIN <DATE> <TIMESTAMP> |
Cancel Recovery(standby alert log) | END | TMI: kcv_commit_to_so_to_primary wait for MRP to die END <DATE> <TIMESTAMP> |
Convert to Primary (standby alert log) | BEGIN | TMI: kcv_commit_to_so_to_primary BEGIN CTSO to primary <DATE> <TIMESTAMP> |
Convert to Primary (standby alert log) | END | TMI: adbdrv BEGIN 10 <DATE> <TIMESTAMP> |
Open Primary(standby alert log) | BEGIN | TMI: adbdrv BEGIN 10 <DATE> <TIMESTAMP> |
Open Primary(standby alert log) | END | TMI: adbdrv END 10 <DATE> <TIMESTAMP> |
Key Failover Operations and Alert Log Tags
All failover steps are documented in the alert log of the target standby where the failover was performed.
-
Cancel Recovery - Stop recovery and close all instances (to mounted) in parallel.
-
Terminal Recovery - Archive standby redo logs and recover any unapplied redo.
-
Convert to Primary - Clear online redo logs and convert control file to standby control file.
-
Open Primary - Open all instances in parallel.
Table 12-10 Failover Alert Log Tags Defining the Steps with Time Management Interface Event Enabled
Step | Stage | Time Management Interface Event Enabled |
---|---|---|
Cancel Recovery | BEGIN | TMI: adbdrv termRecovery BEGIN <DATE> <TIMESTAMP> |
Cancel Recovery | END | TMI: adbdrv termRecovery END <DATE> <TIMESTAMP> |
Terminal Recovery | BEGIN | TMI: krdsmr full BEGIN Starting media recovery <DATE> <TIMESTAMP> |
Terminal Recovery | END | TMI: krdemr full END end media recovery <DATE> <TIMESTAMP> |
Convert to Primary | BEGIN | TMI: kcv_commit_to_so_to_primary BEGIN CTSO to primary <DATE> <TIMESTAMP> |
Convert to Primary | END | TMI: adbdrv BEGIN 10 <DATE> <TIMESTAMP> |
Open Primary | BEGIN | TMI: adbdrv BEGIN 10 <DATE> <TIMESTAMP> |
Open Primary | END | TMI: adbdrv END 10 <DATE> <TIMESTAMP> |