24 Troubleshooting RMAN Operations
Use RMAN message output and dynamic performance views to troubleshoot RMAN operations.
24.1 Interpreting RMAN Message Output
Recovery Manager provides detailed error messages that can aid in troubleshooting problems.
Also, Oracle Database and the third-party media vendors generate useful debugging output of their own. The following discussion explains how to identify and interpret the different errors that you may encounter.
24.1.1 Identifying Types of RMAN Message Output
Output that is useful for troubleshooting failed or unresponsive RMAN jobs is located in several different places.
The following table provides an overview of where to locate message output that can be used to troubleshoot RMAN backup problems.
Table 24-1 Types of Message Output
Type of Output | Produced By | Location | Description |
---|---|---|---|
RMAN messages |
RMAN |
Completed job information is in When running RMAN from the command line, you can direct output to the following places:
|
Contains actions relevant to the RMAN job and error messages generated by RMAN, the database server, and the media vendor. RMAN error messages have an You can execute the following PL/SQL to remove all entries from update node set high_rsr_recid=0
where db_key = our_target_database_db_key ; The preceding function removes all job-related entries. No rows are visible until new backup jobs are shown in |
|
Oracle Database |
The |
Contains a chronological log of errors, initialization parameter settings, and administration operations. Records values for overwritten control file records. |
Oracle trace file |
Oracle Database |
The |
Contains detailed output generated by Oracle Database processes. This file is created when an |
|
Third-party media management software |
The |
Contains vendor-specific information written by the media management software. This log does not contain Oracle Database or RMAN errors. |
Media manager log file |
Third-party media management software |
The file names for any media manager logs other than |
Contains information about the functioning of the media management device |
24.1.2 Troubleshooting Long-Running RMAN Operations
RMAN message output provides information about the progress of backup and recovery operations. Use this information to take any required actions to troubleshoot operations that are stuck or awaiting resources.
Certain operations such as backup, restore, recovery, and duplication for large databases typically take a long time to complete. However, it is not always clear if the operation is progressing or waiting on some resources. Starting with Oracle Database Release 18c, RMAN message output contains additional logging information that indicates if a job is waiting on resources. Every 10 minutes, RMAN checks if there is a change in the number of blocks processed. If there no change in the blocks processed, then RMAN displays a message with the associated wait event.
The following is an example of the RMAN output for a RESTORE
operation:
allocated channel: c1
channel c1: SID=123 device type=SBT_TAPE
channel c1: WARNING: Oracle Test Disk API
Starting restore at 18-JAN-18
channel c1: starting datafile backup set restore
channel c1: specifying datafile(s) to restore from backup set
channel c1: restoring datafile 00002 to /ade/b/2776899351/oracle/dbs/tbs_ax1.f
channel c1: reading from backup piece 01sov1t4_1_1
***** Hang Detected ***** at 2018-01-18 04:11:23 for channel c1, INSTID: 1, SID: 123, serial: 35831
No change in read blocks, thus showing wait event[Total blocks = 192000, Blocks read/recovered = 41530]
Seq_No Event Waiting Time(mirco secs)
602 Backup: MML read backup piece 38094371
***** Hang Detected ***** at 2018-01-18 04:11:33 for channel c1, INSTID: 1, SID: 123, serial: 35831
No change in read blocks, thus showing wait event[Total blocks = 192000, Blocks read/recovered = 41530]
Seq_No Event Waiting Time(mirco secs)
602 Backup: MML read backup piece 48106104
channel c1: piece handle=01sov1t4_1_1 tag=TAG20180118T040804
channel c1: restored backup piece 1
channel c1: restore complete, elapsed time: 00:02:35
Finished restore at 18-JAN-18
released channel: c1
The output indicates that the restore was stuck because of a problem with a media manager read operation. After the read operation completed, the RMAN restore was successful.
24.1.3 Recognizing RMAN Error Message Stacks
RMAN reports errors as they occur. If an error is not retrievable, that is, if RMAN cannot perform failover to another channel to complete a particular job step, then RMAN also reports a summary of the errors after all job sets complete. This feature is known as deferred error reporting.
One way to determine whether RMAN encountered an error is to examine its return code. A second way is to search the RMAN output for the string RMAN-00569
, which is the message number for the error stack banner. All RMAN errors are preceded by this error message. If you do not see an RMAN-00569
message in the output, then there are no errors.
Example 24-1 RMAN Syntax Error
This example shows an RMAN syntax error. The RMAN-00569
message is followed by other error messages that indicate the reason for the error.
RMAN-00571: =========================================================== RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS =============== RMAN-00571: =========================================================== RMAN-00558: error encountered while parsing input commands RMAN-01005: syntax error: found ")": expecting one of: "archivelog, backup, backupset, controlfilecopy, current, database, datafile, datafilecopy, (, plus, ;, tablespace" RMAN-01007: at line 1 column 18 file: standard input
See Also:
24.1.4 Identifying RMAN Error Codes
You can use the error codes in RMAN message stacks to troubleshoot problems with RMAN commands.
Typically, you find the following types of error codes in RMAN message stacks:
-
Errors prefixed with
RMAN-
These are RMAN errors.
-
Errors prefixed with
ORA-
Media manager errors use the
ORA-
prefix. -
Errors preceded by the line
Additional information:
See Also:
-
RMAN Error Message Numbers for the error ranges of RMAN errors
-
ORA-19511: Media Manager Errors for the error ranges of media manager errors
-
Oracle Database Error Messages Referencefor explanations of
RMAN
andORA
error codes
24.1.4.1 RMAN Error Message Numbers
RMAN error messages are prefixed with RMAN-.
The following table indicates the error ranges for common RMAN error messages, all of which are described in Oracle Database Error Messages Reference.
Table 24-2 RMAN Error Message Ranges
Error Range | Cause |
---|---|
0550-0999 |
Command-line interpreter |
1000-1999 |
Keyword analyzer |
2000-2999 |
Syntax analyzer |
3000-3999 |
Main layer |
4000-4999 |
Services layer |
5000-5499 |
Compilation of |
5500-5999 |
Compilation of |
6000-6999 |
General compilation |
7000-7999 |
General execution |
8000-8999 |
PL/SQL programs |
9000-9999 |
Low-level keyword analyzer |
10000-10999 |
Server-side execution |
11000-11999 |
Interphase errors between PL/SQL and RMAN |
12000-12999 |
Recovery catalog packages |
24.1.4.2 ORA-19511: Media Manager Errors
If a media manager error occurs, ORA-19511
is signaled, and the media manager is expected to provide RMAN a descriptive error. RMAN displays the error passed back to it by the media manager.
For example, you might see this:
ORA-19511: Error received from media manager layer, error text: sbtpvt_open_input: file .* does not exist or cannot be accessed, errno = 2
The message from the media manager should provide you with enough information to let you fix the root problem. If it does not, then refer to the documentation for your media manager or contact your media management vendor support representative for further information. ORA-19511
errors originate with the media manager, not with Oracle Database. The database just passes on the message from the media manager. The cause can be addressed only by the media management vendor.
If you are still using an SBT 1.1-compliant media management layer, you may see some additional error message text. Output from an SBT 1.1-compliant media management layer is similar to the following:
ORA-19507: failed to retrieve sequential file, handle="c-140148591-20031014-06", parms="" ORA-27007: failed to open file Additional information: 7000 Additional information: 2 ORA-19511: Error received from media manager layer, error text: SBT error = 7000, errno = 0, sbtopen: backup file not found
The "Additional information" provided uses error codes specific to SBT 1.1. The values displayed correspond to the media manager message numbers and error text listed in Table 24-3. RMAN again signals the error, as an ORA-19511 Error received from media manager layer
error, and a general error message related to the error code returned from the media manager and including the SBT 1.1 error number is then displayed.
The SBT 1.1 error messages are listed here for your reference. Table 24-3 lists media manager message numbers and their corresponding error text. In the error codes, O/S
stands for operating system. The errors marked with an asterisk (*) are internal and are not typically seen during normal operation.
Table 24-3 Media Manager Error Message Ranges
Cause | No. | Message |
---|---|---|
sbtopen |
7000 7001 7002* 7003 7004 7005 7006 7007 7008 7009 7010 7011 7012* |
Backup file not found (only returned for read) File exists (only returned for write) Bad mode specified Invalid block size specified No tape device found Device found, but busy; try again later Tape volume not found Tape volume is in-use I/O Error Can't connect with Media Manager Permission denied O/S error for example malloc, fork error Invalid argument(s) to sbtopen |
sbtclose |
7020* 7021* 7022 7023 7024* 7025 |
Invalid file handle or file not open Invalid flags to sbtclose I/O error O/S error Invalid argument(s) to sbtclose Can't connect with Media Manager |
sbtwrite |
7040* 7041 7042 7043 7044* |
Invalid file handle or file not open End of volume reached I/O error O/S error Invalid argument(s) to sbtwrite |
sbtread |
7060* 7061 7062 7063 7064 7065* |
Invalid file handle or file not open EOF encountered End of volume reached I/O error O/S error Invalid argument(s) to sbtread |
sbtremove |
7080 7081 7082 7083 7084 7085 7086* |
Backup file not found Backup file in use I/O Error Can't connect with Media Manager Permission denied O/S error Invalid argument(s) to sbtremove |
sbtinfo |
7090 7091 7092 7093 7094 7095* |
Backup file not found I/O Error Can't connect with Media Manager Permission denied O/S error Invalid argument(s) to sbtinfo |
sbtinit |
7110* 7111 |
Invalid argument(s) to sbtinit O/S error |
24.1.5 Interpreting RMAN Error Stacks
It is important to identify the relevant messages in the RMAN error stack.
Note the following tips and suggestions while interpreting RMAN messages:
-
Read the messages from the bottom up, because this is the order in which RMAN issues the messages. The last one or two errors displayed in the stack are often the most informative.
-
When you are using an SBT 1.1 media management layer and you are presented with SBT 1.1 style error messages containing the "
Additional information:
" numeric error codes, look for theORA-19511
message that follows for the text of error messages passed back to RMAN by the media manager. These messages identify the real failure in the media management layer. -
Look for the
RMAN-03002
orRMAN-03009
message (RMAN-03009
equalsRMAN-03002
but includes the channel ID), immediately following the error banner. These messages indicate which command failed. Syntax errors generateRMAN-00558
. -
Identify the basic type of error according to the error range chart in Table 24-2 and then refer to the error messages for information about the most important messages.
See Also:
-
Interpreting RMAN Errors: Example and Interpreting Server Errors: Example for examples of RMAN error messages
-
Interpreting SBT 2.0 Media Management Errors: Example and Interpreting SBT 1.1 Media Management Errors: Example for examples of interpreting media management errors
-
Oracle Database Error Messages for information about the error messages
24.1.5.1 Interpreting RMAN Errors: Example
Errors prefixed by RMAN- indicate errors caused by RMAN commands.
You attempt a backup of tablespace users
and receive the following message:
Starting backup at 29-AUG-13 using channel ORA_DISK_1 RMAN-00571: =========================================================== RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS =============== RMAN-00571: =========================================================== RMAN-03002: failure of backup command at 08/29/2013 15:14:03 RMAN-20202: tablespace not found in the recovery catalog RMAN-06019: could not translate tablespace name "USESR"
The RMAN-03002
error indicates that the BACKUP
command failed. You read the last two messages in the stack first and immediately see the problem: no tablespace users
appears in the recovery catalog because you mistyped the name as usesr
.
24.1.5.2 Interpreting Server Errors: Example
Errors from the server are prefixed with ORA-.
Assume that you attempt to recover a tablespace and receive the following errors:
RMAN> RECOVER TABLESPACE users; Starting recover at 29-AUG-13 using channel ORA_DISK_1 starting media recovery media recovery failed RMAN-00571: =========================================================== RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS =============== RMAN-00571: =========================================================== RMAN-03002: failure of recover command at 08/29/2013 15:18:43 RMAN-11003: failure during parse/execution of SQL statement: alter database recover if needed tablespace USERS ORA-00283: recovery session canceled due to errors ORA-01124: cannot recover data file 8 - file is in use or recovery ORA-01110: data file 8: '/oracle/oradata/trgt/users01.dbf'
As suggested, you start reading from the bottom up. The ORA-01110
message explains there was a problem with the recovery of data file users01.dbf
. The second error indicates that the database cannot recover the data file because it is in use or being recovered. The remaining RMAN errors indicate that the recovery session was canceled due to the server errors. Hence, you conclude that because you were not recovering this data file, the problem must be that the data file is online and you must take it offline and restore a backup.
24.1.5.3 Interpreting SBT 2.0 Media Management Errors: Example
This example shows how to interpret errors caused at the media manager level.
Assume that you use a tape drive and see the following output during a backup job:
RMAN-00571: =========================================================== RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS =============== RMAN-00571: =========================================================== ORA-19624: operation failed, retry possible ORA-19507: failed to retrieve sequential file, handle="/tmp/mydir", parms="" ORA-27029: skgfrtrv: sbtrestore returned error ORA-19511: Error received from media manager layer, error text: sbtpvt_open_input:file /tmp/mydir does not exist or cannot be accessed, errno=2
The error text displayed following the ORA-19511
error is generated by the media manager and describes the real source of the failure. See the media manager documentation to interpret this error.
24.1.5.4 Interpreting SBT 1.1 Media Management Errors: Example
This example shows the output of a backup job that has errors media management errors.
Assume that you use a tape drive and see the following output during a backup job:
RMAN-00571: =========================================================== RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS =============== RMAN-00571: =========================================================== RMAN-03009: failure of backup command on c1 channel at 09/04/2013 13:18:19 ORA-19506: failed to create sequential file, name="07d36ecp_1_1", parms="" ORA-27007: failed to open file SVR4 Error: 2: No such file or directory Additional information: 7005 Additional information: 1 ORA-19511: Error received from media manager layer, error text: SBT error = 7005, errno = 2, sbtopen: system error
The main information of interest returned by SBT 1.1 media managers is the error code in the "Additional information" line:
Additional information: 7005
Referring to Table 24-3, you discover that error 7005
means that the media management device is busy. So, the media management software is not able to write to the device because it is in use or there is a problem with it.
Note:
The sbtio.log
contains information written by the media management software, not Oracle Database. Thus, you must consult your media vendor documentation to interpret the error codes and messages. If no information is written to the sbtio.log
, then contact your media manager support to ask whether they are writing error messages in some other location, or whether there are steps you must take to have the media manager errors appear in sbtio.log
.
24.1.6 Identifying RMAN Return Codes
One way to determine whether RMAN encountered an error is to examine its return code or exit status. The RMAN client returns 0 to the shell from which it was invoked if no errors occurred, and a nonzero error value otherwise.
How you access this return code depends upon the environment from which you invoked the RMAN client. For example, if you run UNIX with the C shell, then, when RMAN completes, the return code is placed in a shell variable called $status
. The method of returning exit status is a detail specific to the host operating system rather than the RMAN client.
24.2 Using V$ Views for RMAN Troubleshooting
When LIST
, REPORT
, and SHOW
do not provide all the information that you need for RMAN operations, some V$ views can provide useful details.
Sometimes it is useful to identify exactly what a server session performing a backup and recovery job is doing. The views described in the following table are useful for obtaining information about RMAN jobs.
Table 24-4 Useful V$ Views for Troubleshooting
View | Description |
---|---|
|
Identifies currently active processes |
|
Identifies currently active sessions. Use this view to determine which database server sessions correspond to which RMAN allocated channels. |
|
Lists the events or resources for which sessions are waiting |
You can use the preceding views to perform the following tasks:
24.2.1 Monitoring RMAN Interaction with the Media Manager
You can use the event names in the dynamic performance event views to monitor RMAN calls to the media management API. The event names have one-to-one correspondence with SBT functions.
See the following example:
Backup: MML v1 open backup piece Backup: MML v1 read backup piece Backup: MML v1 write backup piece Backup: MML v1 query backup piece Backup: MML v1 delete backup piece Backup: MML v1 close backup piece . . .
To obtain the complete list of SBT events, you can use the following query:
SELECT NAME FROM V$EVENT_NAME WHERE NAME LIKE '%MML%';
Before making a call to any of functions in the media management API, the server adds a row in V$SESSION_WAIT
, with the STATE
column including the string WAITING
. The V$SESSION_WAIT.SECONDS_IN_WAIT
column shows the number of seconds that the server has been waiting for this call to return. After an SBT function is returned from the media manager, this row disappears.
A row in V$SESSION_WAIT
corresponding to an SBT event name does not indicate a problem, because the server updates these rows at run time. The rows appear and disappear as calls are made and returned. However, if the SECONDS_IN_WAIT
column is high, then the media manager may be suspended.
To monitor the SBT events, you can run the following SQL query:
COLUMN EVENT FORMAT a17 COLUMN SECONDS_IN_WAIT FORMAT 999 COLUMN STATE FORMAT a15 COLUMN CLIENT_INFO FORMAT a30 SELECT p.SPID, s.EVENT, s.SECONDS_IN_WAIT AS SEC_WAIT, sw.STATE, s.CLIENT_INFO FROM V$SESSION_WAIT sw, V$SESSION s, V$PROCESS p WHERE sw.EVENT LIKE '%MML%' AND s.SID=sw.SID AND s.PADDR=p.ADDR;
Examine the SQL output to determine which SBT functions are waiting. For example, the following output indicates that RMAN has been waiting for the sbtbackup
function to return for 10 minutes:
SPID EVENT SEC_WAIT STATE CLIENT_INFO ---- ----------------- ---------- --------------- ------------------------------ 8642 Backup: MML creat 600 WAITING rman channel=ORA_SBT_TAPE_1
Note:
The V$SESSION_WAIT
view shows only database events, not media manager events.
See Also:
Oracle Database Reference for descriptions of the V$SESSION_WAIT
view.
24.2.2 Correlating Server Sessions with RMAN Channels
To identify which server sessions correspond to which RMAN channels, you can query V$SESSION
and V$PROCESS
.
The SPID
column of V$PROCESS
identifies the operating system ID number for the process or thread. For example, on UNIX the SPID
column shows the process ID, whereas on Windows the SPID
column shows the thread ID. You have two basic methods for obtaining this information, depending on whether you have multiple RMAN sessions active concurrently.
This section contains the following topics:
24.2.2.1 Matching Server Sessions with Channels When One RMAN Session Is Active
When only one RMAN session is active, the easiest method for determining the server session ID for an RMAN channel is to query the target database.
Run the following query on the target database while the RMAN job is executing:
COLUMN CLIENT_INFO FORMAT a30 COLUMN SID FORMAT 999 COLUMN SPID FORMAT 9999 SELECT s.SID, p.SPID, s.CLIENT_INFO FROM V$PROCESS p, V$SESSION s WHERE p.ADDR = s.PADDR AND CLIENT_INFO LIKE 'rman%';
The following shows sample output:
SID SPID CLIENT_INFO ---- ------------ ------------------------------ 14 8374 rman channel=ORA_SBT_TAPE_1
If you set an ID using the RMAN SET COMMAND ID
command instead of using the system-generated default ID, then search for that value in the CLIENT_INFO
column instead of 'rman%'
.
24.2.2.2 Matching Server Sessions with Channels in Multiple RMAN Sessions
If multiple RMAN sessions are active, then the V$SESSION.CLIENT_INFO
column can yield the same information for a channel in each session.
For example:
SID SPID CLIENT_INFO ---- ------------ ------------------------------ 14 8374 rman channel=ORA_SBT_TAPE_1 9 8642 rman channel=ORA_SBT_TAPE_1
In this case, you have the following methods for determining which channel corresponds to which SID
value.
24.2.2.2.1 Obtaining the Channel ID from the RMAN Output
You must first obtain the sid
values from the RMAN output and then use these values in your SQL query.
To correlate a process with a channel during a backup:
24.2.2.2.2 Correlating Server Sessions with Channels by Using SET COMMAND ID
You specify a command ID string in the RMAN backup script. You can then query V$SESSION.CLIENT_INFO
for this string.
To correlate a process with a channel during a backup:
See Also:
Oracle Database Backup and Recovery Reference for SET COMMAND ID
syntax, and Oracle Database Reference for more information about V$SESSION
and V$PROCESS
24.3 Testing the Media Management API
On some platforms, Oracle provides a diagnostic tool called sbttest
. This utility performs a simple test of the media management software by acting as the Oracle database server and attempting to communicate with the media manager.
This section contains the following topics:
24.3.1 Obtaining the sbttest Utility
The default location of the sbttest
utility depends on the platform.
On UNIX, the sbttest
utility is typically located in $ORACLE_HOME/bin
. If for some reason the utility is not included with your platform, then contact Oracle Support Services to obtain the C version of the program. You can compile this version of the program on all UNIX platforms.
On platforms such as Solaris, you do not have to relink when using sbttest
. On other platforms, relinking may be necessary.
24.3.2 Obtaining Online Documentation for the sbttest Utility
Use the sbttest
command, without arguments, to list the various arguments for this program.
For online documentation of sbttest
, issue the following on the command line:
% sbttest
The program displays the list of possible arguments for the program:
Error: backup file name must be specified Usage: sbttest backup_file_name # this is the only required parameter <-dbname database_name> <-trace trace_file_name> <-remove_before> <-no_remove_after> <-read_only> <-no_regular_backup_restore> <-no_proxy_backup> <-no_proxy_restore> <-file_type n> <-copy_number n> <-media_pool n> <-os_res_size n> <-pl_res_size n> <-block_size block_size> <-block_count block_count> <-proxy_file os_file_name bk_file_name [os_res_size pl_res_size block_size block_count]> <-libname sbt_library_name>
The display also indicates the meaning of each argument. For example, following is the description for two optional parameters:
Optional parameters: -dbname specifies the database name which will be used by SBT to identify the backup file. The default is "sbtdb" -trace specifies the name of a file where the Media Management software will write diagnostic messages.
24.3.3 Using the sbttest Utility
Use sbttest
to perform a quick test of the media manager.
If sbttest
returns 0, then the test ran without error, which means that the media manager is correctly installed and can accept a data stream and return the same data when requested. If sbttest
returns a nonzero value, then either the media manager is not installed or it is not configured correctly.
To use sbttest:
In some cases, sbttest
can work but an RMAN backup does not. The reasons can be the following:
-
The user who starts
sbttest
is not the owner of the Oracle Database processes. -
If the database server is not linked with the media management library or cannot load it dynamically when needed, then RMAN backups to the media manager fail, but
sbttest
may still work. -
The
sbttest
program passes all environment parameters from the shell but RMAN does not.
24.4 Terminating an RMAN Command
There are several ways to terminate an RMAN command in the middle of execution.
They include the following:
-
The preferred method is to press Ctrl+C (or the equivalent "attention" key combination for your system) in the RMAN interface. This also terminates allocated channels, unless they are suspended in the media management code, as happens when, for example, they are waiting for a tape to be mounted.
-
You can end the server session corresponding to the RMAN channel by running the SQL
ALTER SYSTEM KILL SESSION
statement as described in Terminating the Session with ALTER SYSTEM KILL SESSION. -
You can terminate the server session corresponding to the RMAN channel on the operating system as described in Terminating the Session at the Operating System Level.
24.4.1 Terminating the Session with ALTER SYSTEM KILL SESSION
To terminate an RMAN session by using the ALTER SYSTEM statement, you need the Oracle session ID for the RMAN channel and the serial number. This information is contained in the RMAN log for messages.
Search for messages with the format shown in the following example:
channel ch1: sid=15 devtype=SBT_TAPE
The sid
and devtype
are displayed for each allocated channel. The Oracle Database sid
is different from the operating system process ID. You can end the session using a SQL ALTER
SYSTEM
KILL
SESSION
statement.
ALTER SYSTEM KILL SESSION
takes two arguments, the sid
printed in the RMAN message and a serial number, both of which can be obtained by querying V$SESSION
.
For example, run the following statement, where sid_in_rman_output
is the number from the RMAN message:
SELECT SERIAL#
FROM V$SESSION
WHERE SID=sid_in_rman_output
;
Then, run the following statement, substituting the sid_in_rman_output
and serial number obtained from the query:
ALTER SYSTEM KILL SESSION 'sid_in_rman_output
,serial#
';
This statement has no effect on the session if the session stopped in media manager code.
24.4.2 Terminating the Session at the Operating System Level
Finding and terminating the processes that are associated with the server sessions is operating system-specific. On some platforms, the server sessions are not associated with any processes at all. See your operating system-specific documentation for more information.
24.4.3 Terminating an RMAN Session That Is Not Responding in the Media Manager
You may sometimes need to terminate an RMAN job that is not responding in the media manager. The best way to terminate RMAN when the channel connections are not responding in the media manager is to terminate the session in the media manager.
If this action does not solve the problem, then on some platforms, such as Linux, you may be able to terminate the Oracle Database processes of the connections. (Terminating the Oracle processes may cause problems with the media manager. See your media manager documentation for details.)
24.4.3.1 Components of an RMAN Session
The nature of an RMAN session depends on the operating system.
In UNIX, an RMAN session has the following processes associated with it:
-
The RMAN client process itself
-
The default channel, the initial connection to the target database
-
One target connection to the target database corresponding to each allocated channel
-
The catalog connection to the recovery catalog database, if you use a recovery catalog
-
An auxiliary connection to an auxiliary instance, during
DUPLICATE
or TSPITR operations -
A polling connection to the target database, used for monitoring RMAN command execution on the various allocated channels. By default, RMAN makes one polling connection. RMAN makes additional polling connections if you use different connect strings in the
ALLOCATE CHANNEL
orCONFIGURE CHANNEL
commands. One polling connection exists for each distinct connect string used in theALLOCATE CHANNEL
orCONFIGURE CHANNEL
command.
24.4.3.2 Process Behavior During a Suspended Job
RMAN usually stops responding because a channel connection is waiting in the media manager code for a tape resource. The catalog connection and the default channel appear to suspend, because they are waiting for RMAN to tell them what to do. Polling connections seem to be in an infinite loop while polling the RPC under the control of the RMAN process.
If you terminate the RMAN process itself, then you also terminate the catalog connection, the auxiliary connection, the default channel, and the polling connections. If target and auxiliary connections are suspended but not while executing media manager code, they also terminate. If either the target connection or any of the auxiliary connections are executing in the media management layer, then they do not terminate until the processes are manually terminated at the operating system level.
Not all media managers can detect the termination of the Oracle Database process. Those which cannot may keep resources busy or continue processing. Consult your media manager documentation for details.
Terminating the catalog connection does not cause the RMAN process to terminate because RMAN is not performing catalog operations while the backup or restore is in progress. Removing default channel and polling connections causes the RMAN process to detect that a channel is no longer present and then to exit. In this case, the connections to the unresponsive channels remain active as described previously.
24.4.3.3 Terminating an RMAN Session: Basic Steps
After the unresponsive channels in the media manager code are terminated, the RMAN process detects this termination and exits, removing all connections except target connections that are still operative in the media management layer.
The warning about the media manager resources still applies in this case.
To terminate an Oracle Database process that is not responding in the media manager: