9 Making Applications Highly Available Using Oracle Clusterware
When an application, process, or server fails in a cluster, you want the disruption to be as short as possible, if not completely unknown to users. For example, when an application fails on a server, that application can be restarted on another server in the cluster, minimizing or negating any disruption in the use of that application. Similarly, if a server in a cluster fails, then all of the applications and processes running on that server must be able to fail over to another server to continue providing service to the users. Using the built-in generic_application
resource type or customizable scripts and application agent programs, and resource attributes that you assign to applications and processes, Oracle Clusterware can manage all these entities to ensure high availability.
This chapter explains how to use Oracle Clusterware to start, stop, monitor, restart, and relocate applications. Oracle Clusterware is the underlying cluster solution for Oracle Real Application Clusters (Oracle RAC). The same functionality and principles you use to manage Oracle RAC databases are applied to the management of applications.
This chapter includes the following topics:
Oracle Clusterware Resources and Agents
This section discusses the framework that Oracle Clusterware uses to monitor and manage resources, to ensure high application availability.
This section includes the following topics:
Oracle Clusterware Resources
Oracle Clusterware manages applications and processes as resources that you register with Oracle Clusterware.
The number of resources you register with Oracle Clusterware to manage an application depends on the application. Applications that consist of only one process are usually represented by only one resource. For more complex applications that are built on multiple processes or components that may require multiple resources, you can create resource groups.
Each resource is based on a resource type that serves as a template. You can configure how Oracle Clusterware will place an application in the cluster by specifying an explicit list of servers, or by using features such as server pools and policies. Relationships between applications and components are expressed using dependencies. Oracle Clusterware manages the application by performing operations on the resources, and the resource state represents the availability of the application.
When you register an application as a resource in Oracle Clusterware, in addition to actually adding the resource to the system, you define how Oracle Clusterware manages the application using resource attributes you ascribe to the resource. The frequency with which the resource is checked and the number of attempts to restart a resource on the same server after a failure before attempting to start it on another server (failover) are examples of resource attributes. The registration information also includes a path to an action script or application-specific action program that Oracle Clusterware calls to start, stop, check, and clean up the application.
An action script is a shell script (a batch script in Windows) that a generic script agent provided by Oracle Clusterware calls. An application-specific agent is usually a C or C++ program that calls Oracle Clusterware-provided APIs directly.
Critical Resources
Some large enterprise applications modeled as resource groups can comprise multiple resources representing application or infrastructure components. If any resource in the resource group fails, then Oracle Clusterware must fail the entire resource group over to another server in the cluster.
You can mark a resource as critical for its resource group by specifying the name of the resource in the CRITICAL_RESOURCES attribute of the resource group.
Related Topics
Virtual Machine Resources
A virtual machine is an environment created for a running operating system, known as a guest operating system. The virtual machine displays as a window on your computer’s desktop which can be displayed in full-screen mode or remotely on another computer.
A virtual machine is, essentially, a set of parameters that determines its behavior, analogous to computer system hardware. Parameters include hardware settings (such as how much memory the virtual machine has) as well as state information (such as whether the virtual machine is currently running).
Black-box virtual machines are virtual machines whose contents are unknown to the management interface. All that is known about black-box virtual machines is the virtual hardware they contain: the number of CPUs, the amount of RAM, attached disks, and attached network interfaces. The contents of the hardware however, are unknown. For example, there may be a number of disks attached, but it is not known which operating system is installed on them, nor is it known whether the network cards are configured.
You can manage black-box Oracle virtual machines on physical hardware using Oracle Clusterware, which provides high availability and ease of management of virtual machines.
Note:
This is specific to virtual machines, and does not apply to Oracle VM VirtualBox, or any other Oracle VM product.As an example, in following figure, there are two physical computers, each of which has multiple virtual machines running on it. One of the computers, for each physical host, is an Oracle Grid Infrastructure virtual machine (GIVM).
The GIVMs, themselves, form an Oracle Clusterware cluster, and within this cluster are four black-box virtual machine Oracle Clusterware resources, each monitoring one of the non-GIVM virtual machines. The cluster is not aware of the contents of the virtual machines it is monitoring because they are black-box virtual machines. In this example, if one of the physical hosts goes down, then its GIVM would also go down, causing the GIVM to leave the cluster, which, in turn, causes the resources to fail over to the other GIVM, which starts the black-box virtual machines on the new physical host.
Figure 9-1 Highly Available Virtual Machines in Oracle Database Appliance
Description of "Figure 9-1 Highly Available Virtual Machines in Oracle Database Appliance"
Virtual Machine Architecture
Oracle virtual machines consist of two parts: the virtual machine server and the virtual machine manager. The virtual machine server is a minimal operating system installed on bare hardware that uses a Xen hypervisor to manage guests. The server has an agent process, the Oracle virtual machine agent, which acts as an intermediary through which the virtual machine manager manipulates domains on the server.
The virtual machine manager is a web-based management console that is used to manage virtual machine servers and their virtual machines. The virtual machine manager requires a database as well as an Oracle WebLogic Server in order to run, and is necessary for Oracle-supported management of the Oracle virtual machine server. The management domain is supposed to be as small as possible and, therefore, the virtual machine manager may not be installed there. You must install the virtual machine manager on another host, which means either having another physical computer or manually creating a temporary virtual machine using the Xen xm
commands.
The Oracle virtual machine manager is the sole interface for managing virtual machines. All requests are directed through it, including all APIs and utilities.
The resource type for a virtual machine resource (which is an Oracle Clusterware resource) is similar to the following:
ATTRIBUTE=DESCRIPTION
TYPE=string
DEFAULT_VALUE="Resource type for VM Agents"
ATTRIBUTE=AGENT_FILENAME
TYPE=string
DEFAULT_VALUE=%CRS_HOME%/bin/oraagent%CRS_EXE_SUFFIX%
ATTRIBUTE=CHECK_INTERVAL
DEFAULT_VALUE=1
TYPE=int
ATTRIBUTE=OVMM_VM_ID
TYPE=string
DEFAULT_VALUE=''
ATTRIBUTE=OVMM_VM_NAME
TYPE=string
DEFAULT_VALUE=''
ATTRIBUTE=VM
TYPE=strings
Resource Groups
A resource group is a container for a logically related group of resources.
An application is modeled as a resource group that contains the application resource and related application resources (such as WebServer), and infrastructure resources (such as disk groups and VIPs). A resource group provides a logical and intuitive entity for high availability modeling of all classes of applications.
You create resource groups using CRSCTL, and then add resources to the resource group. A resource group provides a set of attributes that cover naming, description, and common placement and failover parameter values for the resources that are members of the resource group.
Resource Group Principles
-
You create a resource group based on a resource group type.
-
A resource can be member of only one resource group. You can specify a resource group for a resource when you create the resource.
If you do not specify a resource group when you create a resource, then the resource becomes a member of an automatic resource group created for that resource. You can later add the resource to a different resource group.
-
Resource groups are aware of critical resources, and the state of the resource group is solely determined by the state of its critical resources.
You can remove a non-critical resource from a resource group (subject to dependency checks) and, at a later time, add it to another resource group.
-
Resource groups have cardinality to specify the number of instances of the resource group that can simultaneously run in the cluster.
-
All member resources of a running resource group instance are located on the same server.
-
Oracle Clusterware restarts a resource group in the event of failure and then relocates the resource group to another server in the event of local restart failures.
Automatic Resource Groups
If you create a resource without specifying a resource group, then Oracle Clusterware implicitly and automatically adds the resource to a resource group with the same name as the resource.
An automatic resource group is created for each resource that is not explicitly added to a resource group. You can create resources without using resource groups and work with Oracle Clusterware without disruption. Using resource groups, however, enables you to define relationships to infrastructure and application resources (through automatic resource groups) created by SRVM or other existing utilities.
An automatic resource group is solely described by the resource that it has been created for, and cannot be modified by an administrator. Resources that you create without specifying a resource group can be added to a resource group at a later time. Oracle Clusterware deletes the automatic resource group to which the resource belongs when the resource has been explicitly added to a resource group.
Resource Group Management
-
You can add a resource to a resource group when you create the resource.
-
You can explicitly add a resource that belongs to its automatic resource group to another resource group. The resource must be OFFLINE when you add the resource to a group that is either ONLINE or OFFLINE.
-
You can remove a non-critical resource from a resource group (OFFLINE or ONLINE) as long as no other resource in the group depends on it. The resource you remove then becomes a member of its automatic resource group. At a later time, you can add this resource to another group.
-
You can delete a non-critical resource, thereby removing the resource from the resource group and deleting it from Oracle Clusterware. You cannot delete a critical resource of a resource group, unless you first update the critical resources list of the resource group to unmark the resource as critical.
-
A resource group is empty when it is initially created and also becomes empty when each resource in the group has been removed. An empty resource group cannot be started and its state will be always be OFFLINE.
Share Resources
In various Oracle Clusterware deployments, there are components, such as file systems, that multiple applications share. A single Oracle ACFS resource, for example, cannot be a member of multiple resource groups that make use of the filesystem because, by definition, a resource can be a member of only one resource group.
For these types of resources, Oracle recommends that you put them in their own individual resource groups, either explicit or automatic, and configure appropriate dependencies from the application resource groups to these shared resource groups. In this manner, multiple applications can share components.
Critical Resources
You can have a large-enterprise type application that is modeled as a resource group that contains multiple resources corresponding to application or infrastructure components. If any of the resources in such a resource group fails, then Oracle Clusterware fails the entire resource group over to another server in the cluster. Some resources in the resource group, however, are not critical to the application and would not necessarily require failing over the entire resource group, which would cause an unnecessary disruption in the running of the instance.
You can define certain resources within a resource group as critical (by specifying the name of the resource in the CRITICAL_RESOURCES list attribute of the resource group) and, should any of those resources fail, then Oracle Clusterware will fail the resource group over to another server in the cluster.
Further, the state of a resource group is determined by the state of its critical resources. Non-critical resources do not affect the state of the resource group nor can they trigger failover of the resource group. A resource group must have at least one critical resource before the resource group can be started and brought online.
You can specify individual resources in a resource group as critical or you can specify a resource type as critical, which would make all resources of that particular type critical. For example:
CRITICAL_RESOURCES="r1 r2 r3"
The preceding example lists three, space-delimited resources marked as critical.
CRITICAL_RESOURCES="appvip type:ora.export.type"
The preceding example lists a particular resource type as critical, thereby making any resource of this type a critical resource.
When you create a resource group or remove its members, it is empty and, consequently, there are no critical resources. The first resource that you add to the resource group is automatically marked as critical by Oracle Clusterware, provided that you have not already specified a resource type in the CRITICAL_RESOURCES attribute of the resource group. Oracle Clusterware always checks for the presence of a critical resource before attempting to start a resource group.
Resource Group Privileges
You can create resource groups and resource group types, and then create and add resources to those groups. You and also define privileges for modifying and executing operations on a resource group using the ACL attribute of the resource group. The resource group owner can assign privileges to other operating system users and groups by appropriately setting the ACL attribute of the resource group. A resource within a resource group can maintain its own privilege specification within its ACL attribute. Specifically:
-
A user with write privilege on a resource group and write privilege on a resource can add the resource to the group.
-
The owner of the resource group must at all times have execute privileges on all resources in the group. Any user or group granted execute privileges on the group must have execute privileges on all resources in the group.
For example, in cases where certain infrastructure resources in a resource group must be managed by
root
, the owner of the resource must be specified asroot
and execute permissions on the resource granted to the group owner. This must be done explicitly byroot
user. -
The local administrative user (
root
on Unix or Administrators group user on Windows) can modify, delete, start, and stop any resource group.
Resource Group Dependencies
You can set dependencies among resource groups, providing a means to express relationships between applications and components. Oracle Clusterware provides modifiers to specify different ordering, location, and enforcement level of dependencies amongst resource groups. Some things to consider about resource group dependencies:
-
A resource group can have a dependency relationship to another resource group and not to individual resources.
-
An explicitly created resource group can have a dependency relationship to an automatic resource group.
-
A resource in a group can have a dependency relationship to another resource in the same group.
-
Resources created without specifying a resource group (thus belonging to an automatic resource group) can have a dependency relationship to another resource group.
-
A resource cannot have a dependency relationship to a resource group nor to a resource in a different resource group.
All available Oracle Clusterware resource dependencies are also available to use with resource groups. You configure the START_DEPENDENCIES and STOP_DEPENDENCIES attributes of a resource group to specify dependencies for resource groups.
Table 9-1 Resource Group Dependency Types and Modifiers
Dependency Type | Description |
---|---|
hard start |
Specifies the requirement that specific other resource groups must be online (anywhere in the cluster) before this resource group can be started. For example:
If the start of any dependent resource group fails, then Oracle Clusterware aborts the start of this resource group. |
weak start |
Specifies the requirement that an attempt must be made to start specific other resource groups before starting this resource group. If the attempt fails to start the specific other resource groups, then Oracle Clusterware starts this resource group, regardless. For example:
|
pullup |
Use This dependency when this resource group must be automatically started when a dependent resource group starts. For example:
Oracle recommends that you use this dependency when a stop dependency exists between the resource groups. |
hard stop |
This dependency specifies the mandatory requirement of stopping this resource group when another specific resource group stops running. For example:
|
attraction |
Specifies a co-location preference with specific other resource groups. For example:
Oracle Clusterware will attempt to start this resource group on the same server where a specific other resource group is already online. |
dispersion |
Specifies preference to not be co-located with specific other resource groups. Oracle Clusterware will attempt to start this resource group on a server with the least number of online resource groups with dispersion dependency. For example:
|
exclusion |
Specifies a mandatory requirement that this resource group not run on the same server as specific other resource groups. Oracle Clusterware will either reject the start of this resource group or stop the dependent resource groups and restart them on another server. For example:
|
Resource Group Failure and Recovery
As previously discussed, critical resources determine resource group state and failover.
Failure and Recovery of Critical Resources
-
When a critical resource of a resource group fails, the resource group immediately transitions to the OFFLINE state.
-
Oracle Clusterware attempts local restart of the failed critical resource according to the RESTART_ATTEMPTS and UPTIME_THRESHOLD resource attributes.
-
Oracle Clusterware initiates immediate check actions on other resources in the same group that have a stop dependency on the failed resource.
-
Oracle Clusterware initiates immediate check actions on other resource groups dependent on this resource group.
-
If the resource restarts successfully, then the resource group transitions to ONLINE state and Oracle Clusterware performs pullup dependency evaluation within and across resource groups.
-
If Oracle Clusterware exhausts all local restart attempts of the resource, then Oracle Clusterware stops the entire resource group. Oracle Clusterware also immediately stops other resource groups with a stop dependency on the resource group. Oracle Clusterware attempts local restart of the resource group, if configured to do so. On exhausting all restart attempts, the resource group will fail over to another server in the cluster.
Failure and Recovery of Non-Critical Resources
-
When a non-critical resource in a resource group fails, Oracle Clusterware attempts local restart of the failed resource according to the values of the RESTART_ATTEMPTS and UPTIME_THRESHOLD resource attributes. There is no impact on the state of the resource group when a non-critical resource fails.
-
Oracle Clusterware initiates immediate check actions on other resources in the same group that have a stop dependency on the failed resource.
-
If the resource restarts successfully, Oracle Clusterware performs pullup dependency evaluation and corresponding startup actions.
-
If Oracle Clusterware exhausts all local restart attempts of the resource, then there is no impact on the state of the resource group. You must then explicitly start the non-critical resource after fixing the cause of the failure.
Related Topics
Resource Group Types
In Oracle Clusterware, a resource type is a template for a class of resources.
Resource group types provide a commonly applicable set of attributes to all resource groups. When you create a resource group, you must specify a resource group type. Oracle Clusterware provides two base resource group types: local_resourcegroup
and cluster_resourcegroup
. The base resource types have attributes similar to resources, some of which you can configure.
Local Resource Group Type
Use the local_resourcegroup
type to create a resource group that contains only local resources. Instances of a resource group of this type can run on each node in the cluster. Local resource group type attributes include:
- NAME
- DESCRIPTION
- ACL
- AUTO_START
- CRITICAL_RESOURCES
- DEBUG
- ENABLED
- INTERNAL_STATE
- RESOURCE_LIST
- RESTART_ATTEMPTS
- SERVER_CATEGORY
- START_DEPENDENCIES
- STOP_DEPENDENCIES
- STATE
- STATE_DETAILS
- UPTIME_THRESHOLD
Cluster Resource Group Type
A resource group of type cluster_resourcegroup
can have one or more instances running on a static or dynamic set of servers in the cluster. Such a resource group can fail over to another server in the cluster according to the placement policy of the group. Cluster resource group type attributes include:
- NAME
- DESCRIPTION
- ACL
- ACTIVE_PLACEMENT
- AUTO_START
- CARDINALITY
- CRITICAL_RESOURCES
- DEBUG
- ENABLED
- FAILURE_INTERVAL
- FAILURE_THRESHOLD
- HOSTING_MEMBERS
- INTERNAL_STATE
- PLACEMENT
- RESOURCE_LIST
- RESTART_ATTEMPTS
- SERVER_CATEGORY
- SERVER_POOLS
- START_DEPENDENCIES
- STOP_DEPENDENCIES
- STATE
- STATE_DETAILS
- UPTIME_THRESHOLD
Related Topics
Using Resource Groups
Use CRSCTL to create resource groups, resource group types, and to add resources to resource groups.
Oracle Clusterware Resource Types
Generally, all resources are unique but some resources may have common attributes. Oracle Clusterware uses resource types to organize these similar resources. Benefits that resource types provide are:
-
Manage only necessary resource attributes
-
Manage all resources based on the resource type
Every resource that you register in Oracle Clusterware must have a certain resource type. In addition to the resource types included in Oracle Clusterware, you can define custom resource types using the Oracle Clusterware Control (CRSCTL) utility. The included resource types are:
-
Local resource: Instances of local resources—type name is
local_resource
—run on each server of the cluster (the default) or you can limit them to run on servers belonging to a particular server category. When a server joins the cluster, Oracle Clusterware automatically extends local resources to have instances tied to the new server. When a server leaves the cluster, Oracle Clusterware automatically sheds the instances of local resources that ran on the departing server. Instances of local resources are pinned to their servers; they do not fail over from one server to another. -
Cluster resource: Cluster-aware resource types—type name is
cluster_resource
—are aware of the cluster environment and are subject to cardinality and cross-server switchover and failover. -
Generic application: You can use this resource type—type name is
generic_application
—to protect any generic applications without requiring additional scripts. High availability for an application is achieved by defining a resource with thegeneric_application
resource type and providing the values for key attributes of the resource. Thegeneric_application
resource type is derived from thecluster_resource
resource type and, therefore, all resources of thegeneric_application
resource type are cluster-aware resources. Attributes include:-
START_PROGRAM
: A complete path to the executable that starts the application, with all appropriate arguments. The executable must exist on every server where Oracle Grid Infrastructure is configured to run the application. This attribute is required. For example:/opt/my_app –start
The executable must also ensure that the application starts and return an exit status value of zero (0) to indicate that the application started successfully and is online. If the executable fails to start the application, then the executable exits with a non-zero status code.
-
STOP_PROGRAM
: A complete path to the executable that stops the application, with all appropriate arguments. The executable must exist on every server where Oracle Grid Infrastructure is configured to run the application. If you do not specify this attribute value, then Oracle Clusterware uses an operating system-equivalent of thekill
command. For example:/opt/my_app –stop
The executable must also ensure that the application stops and return an exit status value of zero (0) to indicate that the application stopped successfully. If the executable fails to stop the application, then the executable exits with a non-zero status code and Oracle Clusterware initiates a clean of the resource.
-
CLEAN_PROGRAM
: A complete path to the executable that cleans the program, with all appropriate arguments. The executable must exist on every server where Oracle Grid Infrastructure is configured to run the application. If you do not specify a value for this attribute, then Oracle Clusterware uses an operating system-equivalent of thekill -9
command. For example:/opt/my_app –clean
Note:
The difference between
STOP_PROGRAM
andCLEAN_PROGRAM
is thatCLEAN_PROGRAM
is a forced stop that stops an application ungracefully, and must always be able to stop an application or the application becomes unmanageable. -
PID_FILES
: A comma-delimited list of complete paths to files that will be written by the application and contain a process ID (PID) to monitor. Failure of a single process is treated as a complete resource failure. For example:/opt/app.pid
Note:
The files that you specify in the
PID_FILES
attribute are read immediately after the START action completes and monitoring commences for the PIDs listed in the files. -
EXECUTABLE_NAMES
: A comma-delimited list of names of executables that is created when the application starts and the state of these executables is subsequently monitored. Failure of a single executable is treated as a complete resource failure. For example:my_app
Note:
You need specify only the complete name of the executables. This attribute does not accept the path of the executable or wild cards. The PIDs matching the executable names are cached immediately after the START action completes.
-
CHECK_PROGRAMS
: A list of complete paths to the executable that determines the state of the application. Reporting a non-running state by any of the applications is treated as a failure of the entire resource. For example:/opt/my_app –check
-
ENVIRONMENT_FILE
: A complete path to the file containing environment variables to source when starting the application. The file must be a text file containingname=value
pairs, one per line. For example:/opt/my_app.env
-
ENVIRONMENT_VARS
: A comma-delimited list ofname=name
pairs to be included into the environment when starting an application. For example:USE_FILES=No, AUTO_START=Yes
-
SEND_OUTPUT_ALWAYS
: This attribute is responsible for sending the application output that is sent to STDOUT, which is then displayed. A value of 0 does not display any application output unless an action fails. When an action fails, whatever application output that has been saved by the agent is displayed. Any value greater than 0 displays every application output. The default value is 0. For example:SEND_OUTPUT_ALWAYS=1
Note:
If you do not specify the
STOP_PROGRAM
,CHECK_PROGRAMS
, andCLEAN_PROGRAM
attributes, then you must specify eitherPID_FILES
orEXECUTABLE_NAMES
, or Oracle Clusterware will not allow you to register a resource of this type.If you specify all the attributes, then the following rules apply:
-
When stopping a resource, if you specified
STOP_PROGRAM
, then Oracle Clusterware callsSTOP_PROGRAM
. Otherwise, Oracle Clusterware uses an operating system-equivalent of thekill -9
command on the PID obtained from either thePID_FILES
or theEXECUTABLE_NAMES
attribute. -
When you need to establish the current state of an application, if you specified
CHECK_PROGRAMS
, then Oracle Clusterware callsCHECK_PROGRAMS
. Otherwise, Oracle Clusterware uses an operating system-equivalent of theps -p
command with the PID obtained from either thePID_FILES
orEXECUTABLE_NAMES
attribute. -
When cleaning a resource, if you specified
CLEAN_PROGRAM
, then Oracle Clusterware callsCLEAN_PROGRAM
. Otherwise, Oracle Clusterware uses an operating system-equivalent of thekill -9
command on the PID obtained from either thePID_FILES
or theEXECUTABLE_NAMES
attribute.
-
Agents in Oracle Clusterware
Oracle Clusterware runs all resource-specific commands through an entity called an agent.
Note:
To increase security and further separate administrative duties, Oracle Clusterware agents run with the SYSRAC administrative privilege, and no longer require the SYSDBA administrative privilege. The SYSRAC administrative privilege is the default mode of connecting to the database by the Oracle Clusterware agent on behalf of Oracle RAC utilities, such as SRVCTL, so that no SYSDBA connections to the database are necessary for everyday administration of Oracle RAC database clusters.An agent is a process that contains the agent framework and user code to manage resources. The agent framework is a library that enables you to plug in your application-specific code to manage customized applications. You program all of the actual application management functions, such as starting, stopping and checking the health of an application, into the agent. These functions are referred to as entry points.
The agent framework is responsible for invoking these entry point functions on behalf of Oracle Clusterware. Agent developers can use these entry points to plug in the required functionality for a specific resource regarding how to start, stop, and monitor a resource. Agents are capable of managing multiple resources.
Agent developers can set the following entry points as callbacks to their code:
-
ABORT: If any of the other entry points hang, the agent framework calls the ABORT entry point to stop the ongoing action. If the agent developer does not supply a stop function, then the agent framework exits the agent program.
-
ACTION: The ACTION entry point is Invoked when a custom action is invoked using the
clscrs_request_action
API of thecrsctl request action
command. -
CHECK: The CHECK (monitor) entry point acts to monitor the health of a resource. The agent framework periodically calls this entry point. If it notices any state change during this action, then the agent framework notifies Oracle Clusterware about the change in the state of the specific resource.
-
CLEAN: The CLEAN entry point acts whenever there is a need to clean up a resource. It is a non-graceful operation that is invoked when users must forcefully terminate a resource. This command cleans up the resource-specific environment so that the resource can be restarted.
-
DELETE: The DELETE entry point is invoked on every node where a resource can run when the resource is unregistered.
-
MODIFY: The MODIFY entry point is invoked on every node where a resource can run when the resource profile is modified.
-
START: The START entry point acts to bring a resource online. The agent framework calls this entry point whenever it receives the start command from Oracle Clusterware.
-
STOP: The STOP entry points acts to gracefully bring down a resource. The agent framework calls this entry point whenever it receives the stop command from Oracle Clusterware.
START, STOP, CHECK, and CLEAN are mandatory entry points and the agent developer must provide these entry points when building an agent. Agent developers have several options to implement these entry points, including using C, C++, or scripts. It is also possible to develop agents that use both C or C++ and script-type entry points. When initializing the agent framework, if any of the mandatory entry points are not provided, then the agent framework invokes a script pointed to by the ACTION_SCRIPT
resource attribute.
At any given time, the agent framework invokes only one entry point per application. If that entry point hangs, then the agent framework calls the ABORT entry point to end the current operation. The agent framework periodically invokes the CHECK entry point to determine the state of the resource. This entry point must return one of the following states as the resource state:
-
CLSAGFW_ONLINE
: The CHECK entry point returns ONLINE if the resource was brought up successfully and is currently in a functioning state. The agent framework continues to monitor the resource when it is in this state. This state has a numeric value of 0 for thescriptagent
. -
CLSAGFW_UNPLANNED_OFFLINE
andCLSAGFW_PLANNED_OFFLINE
: The OFFLINE state indicates that the resource is not currently running. These two states have numeric values of 1 and 2, respectively, for thescriptagent
.Two distinct categories exist to describe an resource's offline state: planned and unplanned.
When the state of the resource transitions to OFFLINE through Oracle Clusterware, then it is assumed that the intent for this resource is to be offline (
TARGET=OFFLINE
), regardless of which value is returned from the CHECK entry point. However, when an agent detects that the state of a resource has changed independent of Oracle Clusterware (such as somebody stopping the resource through a non-Oracle interface), then the intent must be carried over from the agent to the Cluster Ready Services daemon (CRSD). The intent then becomes the determining factor for the following:-
Whether to keep or to change the value of the resource's
TARGET
resource attribute.PLANNED_OFFLINE
indicates that theTARGET
resource attribute must be changed to OFFLINE only if the resource was running before. If the resource was not running (STATE=OFFLINE
,TARGET=OFFLINE
) and a request comes in to start it, then the value of theTARGET
resource attribute changes to ONLINE. The start request then goes to the agent and the agent reports back to Oracle Clusterware aPLANNED_OFFLINE
resource state, and the value of theTARGET
resource attribute remains ONLINE.UNPLANNED_OFFLINE
does not change theTARGET
attribute. -
Whether to leave the resource's state as
UNPLANNED_OFFLINE
or attempt to recover the resource by restarting it locally or failing it over to a another server in the cluster. ThePLANNED_OFFLINE
state makes CRSD leave the resource as is, whereas theUNPLANNED_OFFLINE
state prompts resource recovery.
-
-
CLSAGFW_UNKNOWN
: The CHECK entry point returns UNKNOWN if the current state of the resource cannot be determined. In response to this state, Oracle Clusterware does not attempt to failover or to restart the resource. The agent framework continues to monitor the resource if the previous state of the resource was either ONLINE or PARTIAL. This state has a numeric value of 3 for thescriptagent
. -
CLSAGFW_PARTIAL
: The CHECK entry point returns PARTIAL when it knows that a resource is partially ONLINE and some of its services are available. Oracle Clusterware considers this state as partially ONLINE and does not attempt to failover or to restart the resource. The agent framework continues to monitor the resource in this state. This state has a numeric value of 4 for thescriptagent
. -
CLSAGFW_FAILED
: The CHECK entry point returns FAILED whenever it detects that a resource is not in a functioning state and some of its components have failed and some clean up is required to restart the resource. In response to this state, Oracle Clusterware calls the CLEAN action to clean up the resource. After the CLEAN action finishes, the state of the resource is expected to be OFFLINE. Next, depending on the policy of the resource, Oracle Clusterware may attempt to failover or restart the resource. Under no circumstances does the agent framework monitor failed resources. This state has a numeric value of 5 for thescriptagent
.
The agent framework implicitly monitors resources in the states listed in Table 9-2 at regular intervals, as specified by the CHECK_INTERVAL
or OFFLINE_CHECK_INTERVAL
resource attributes.
Table 9-2 Agent Framework Monitoring Characteristics
State | Condition | Frequency |
---|---|---|
ONLINE |
Always |
\
|
PARTIAL |
Always |
|
OFFLINE |
Only if the value of the |
|
UNKNOWN |
Only monitored if the resource was previously being monitored because of any one of the previously mentioned conditions. |
If the state becomes UNKNOWN after being ONLINE, then the value of |
Whenever an agent starts, the state of all the resources it monitors is set to UNKNOWN. After receiving an initial probe request from Oracle Clusterware, the agent framework executes the CHECK entry point for all of the resources to determine their current states.
Once the CHECK action successfully completes for a resource, the state of the resource transitions to one of the previously mentioned states. The agent framework then starts resources based on commands issued from Oracle Clusterware. After the completion of every action, the agent framework invokes the CHECK action to determine the current resource state. If the resource is in one of the monitored states listed in Table 9-2, then the agent framework periodically executes the CHECK entry point to check for changes in resource state.
By default, the agent framework does not monitor resources that are offline. However, if the value of the OFFLINE_CHECK_INTERVAL
attribute is greater than 0, then the agent framework monitors offline resources.
Related Topics
Oracle Clusterware Built-in Agents
Oracle Clusterware uses agent programs (agents) to manage resources and includes the following built-in agents to protect applications:
-
appagent
: This agent (appagent.exe
in Windows) automatically protects resources of thegeneric_application
resource type and any resources in previous versions of Oracle Clusterware of theapplication
resource type.Note:
Oracle recommends that you not use the deprecated
application
resource type, which is only provided to support pre-Oracle Clusterware 11g release 2 (11.2) resources. -
scriptagent
: Use this agent (scriptagent.exe
in Windows) when using shell or batch scripts to protect an application. Both thecluster_resource
andlocal_resource
resource types are configured to use this agent, and any resources of these types automatically take advantage of this agent.
Additionally, you can create your own agents to manage your resources in any manner you want.
Related Topics
Action Scripts
An action script defines one or more actions to start, stop, check, or clean resources.
The agent framework invokes these actions without the C/C++ actions. Using action scripts, you can build an agent that contains the C/C++ entry points and the script entry points. If all of the actions are defined in the action script, then you can use the script agent to invoke the actions defined in any action scripts.
Before invoking the action defined in the action script, the agent framework exports all the necessary attributes from the resource profile to the environment. Action scripts can log messages to the stdout/stderr
, and the agent framework prints those messages in the agent logs. However, action scripts can use special tags to send the progress, warning, or error messages to the crs*
client tools by prefixing one of the following tags to the messages printed to stdout/stderr
:
CRS_WARNING:
CRS_ERROR:
CRS_PROGRESS:
The agent framework strips out the prefixed tag when it sends the final message to the crs*
clients.
Resource attributes can be accessed from within an action script as environment variables prefixed with _CRS_
. For example, the START_TIMEOUT
attribute becomes an environment variable named _CRS_START_TIMEOUT
.
Related Topics
Building an Agent
Building an agent for a specific application involves the following steps:
- Implement the agent framework entry points either in scripts, C, or C++.
- Build the agent executable (for C and C++ agents).
- Collect all the parameters needed by the entry points and define a new resource type. Set the
AGENT_FILENAME
attribute to the absolute path of the newly built executable.
Building and Deploying C and C++ Agents
Example C and C++ agents are included with Oracle Clusterware that demonstrate using the agent framework to implement high availability agents for applications. Appendix F describes an example of an agent called demoagent1.cpp
. This agent manages a simple resource that represents a file on disk and performs the following tasks:
-
On start: Creates the file
-
On stop: Gracefully deletes the file
-
On check: Detects whether the file is present
-
On clean: Forcefully deletes the file
To describe this particular resource to Oracle Clusterware, you must first create a resource type that contains all the characteristic attributes for this resource class. In this case, the only attribute to be described is the name of the file to be managed. The following steps demonstrate how to set up the resource and its agent and test the functionality of the resource:
Registering a Resource in Oracle Clusterware
Register resources in Oracle Clusterware using the crsctl add
resource
command.
To register an application as a resource:
$ crsctl add resource resource_name -type [-group group_name] resource_type
[-file file_path] | [-attr "attribute_name='attribute_value', attribute_name='
attribute_value', ..."]
Choose a name for the resource based on the application for which it is being created. For example, if you create a resource for an Apache Web server, then you might name the resource myApache
. Specify the name of an existing resource type after the -type
option. Optionally, you can add the resource to an existing resource group.
You can specify resource attributes in either a text file specified with the -file
option or in a comma-delimited list of resource attribute-value pairs enclosed in double quotation marks (""
) following the -attr
option. You must enclose space- or comma-delimited attribute names and values enclosed in parentheses in single quotation marks (''
).
The following is an example of an attribute file:
PLACEMENT=favored
HOSTING_MEMBERS=node1 node2 node3
RESTART_ATTEMPTS@CARDINALITYID(1)=0
RESTART_ATTEMPTS@CARDINALITYID(2)=0
FAILURE_THRESHOLD@CARDINALITYID(1)=2
FAILURE_THRESHOLD@CARDINALITYID(2)=4
FAILURE_INTERVAL@CARDINALITYID(1)=300
FAILURE_INTERVAL@CARDINALITYID(2)=500
CHECK_INTERVAL=2
CARDINALITY=2
The following is an example of using the -attr
option:
$ crsctl add resource resource_name -type resource_type [-attr "PLACEMENT='
favored', HOSTING_MEMBERS='node1 node2 node3', ..."]
Overview of Using Oracle Clusterware to Enable High Availability
Oracle Clusterware manages resources and resource groups based on how you configure them to increase their availability.
You can configure your resources and resource groups so that Oracle Clusterware:
-
Starts resources and resource groups during cluster or server start
-
Restarts resources and resource groups when failures occur
-
Relocates resources and resource groups to other servers, if the servers are available
To manage your applications with Oracle Clusterware:
-
Use the
generic_application
resource type, write a custom script for the script agent, or develop a new agent. -
Register your applications as resources with Oracle Clusterware.
If a single application requires that you register multiple resources, then you can create a resource group that Oracle Clusterware manages like a single resource. You may be required to define relevant dependencies between the resources within the resource group.
-
Assign the appropriate privileges to the resource or resource group.
-
Start or stop your resources and resource groups.
When a resource fails, Oracle Clusterware attempts to restart the resource based on attribute values that you provide when you register an application or process as a resource. If the failed resource is a non-critical resource member of a resource group, then the resource group remains in an ONLINE state. If a server in a cluster fails, then you can configure your resources and resource groups so that processes that were assigned to run on the failed server restart on another server. Based on various resource attributes, Oracle Clusterware supports a variety of configurable scenarios.
When you register a resource or create a resource group in Oracle Clusterware, the relevant information about the application and the resource-relevant information, is stored in the Oracle Cluster Registry (OCR). This information includes:
-
Path to the action script or application-specific agent: This is the absolute path to the script or application-specific agent that defines the start, stop, check, and clean actions that Oracle Clusterware performs on the application.
See Also:
"Agents in Oracle Clusterware" for more information about these actions
-
Privileges: Oracle Clusterware has the necessary privileges to control all of the components of your application for high availability operations, including the right to start processes that are owned by other user identities. Oracle Clusterware must run as a privileged user to control applications with the correct start and stop processes.
-
Resource Dependencies: You can create relationships among resources and resource groups that imply an operational ordering or that affect the placement of resources on servers in the cluster. For example, Oracle Clusterware can only start a resource that has a hard start dependency on another resource if the other resource is running. Oracle Clusterware prevents stopping a resource if other resources that depend on it are running. However, you can force a resource to stop using the
crsctl stop resource -f
command, which first stops all resources that depend on the resource being stopped.
Resource Attributes
Resource attributes define how Oracle Clusterware manages resources of a specific resource type. Each resource type has a unique set of attributes. Some resource attributes are specified when you register resources, while others are internally managed by Oracle Clusterware.
Note:
Where you can define new resource attributes, you can only use US-7 ASCII characters.
Related Topics
Resource States
Every resource in a cluster is in a particular state at any time. Certain actions or events can cause that state to change.
Table 9-3 lists and describes the possible resource states.
Table 9-3 Possible Resource States
State | Description |
---|---|
ONLINE |
The resource is running. |
OFFLINE |
The resource is not running. |
UNKNOWN |
An attempt to stop the resource has failed. Oracle Clusterware does not actively monitor resources that are in this state. You must perform an application-specific action to ensure that the resource is offline, such as stop a process, and then run the |
INTERMEDIATE |
A resource can be in the INTERMEDIATE state because of one of two events:
Oracle Clusterware actively monitors resources that are in the INTERMEDIATE state and, typically, you are not required to intervene. If the resource is in the INTERMEDIATE state due to the preceding reason 1, then as soon as the state of the resource is established, Oracle Clusterware transitions the resource out of the INTERMEDIATE state. If the resource is in the INTERMEDIATE state due to the preceding reason 2, then it stays in this state if it remains partially online. For example, the home server of the VIP must rejoin the cluster so the VIP can switch over to it. A database administrator must issue a command to open the database instance. In either case, however, Oracle Clusterware transitions the resource out of the INTERMEDIATE state automatically as soon as it is appropriate.Use the |
Resource Dependencies
You can configure resources to be dependent on other resources, so that the dependent resources can only start or stop when certain conditions of the resources on which they depend are met. For example, when Oracle Clusterware attempts to start a resource, it is necessary for any resources on which the initial resource depends to be running and in the same location. If Oracle Clusterware cannot bring the resources online, then the initial (dependent) resource cannot be brought online, either. If Oracle Clusterware stops a resource or a resource fails, then any dependent resource is also stopped.
Some resources require more time to start than others. Some resources must start whenever a server starts, while other resources require a manual start action. These and many other examples of resource-specific behavior imply that each resource must be described in terms of how it is expected to behave and how it relates to other resources (resource dependencies).
You can configure resources so that they depend on Oracle resources. When creating resources, however, do not use an ora prefix in the resource name. This prefix is reserved for Oracle use only.
Previous versions of Oracle Clusterware included only two dependency specifications: the REQUIRED_RESOURCES
resource attribute and the OPTIONAL_RESOURCES
resource attribute. The REQUIRED_RESOURCES
resource attribute applied to both start and stop resource dependencies.
Note:
The REQUIRED_RESOURCES
and OPTIONAL_RESOURCES
resource attributes are still available only for resources of
application
type. Their use to define resource dependencies is
deprecated in Oracle Clusterware 12c and later releases.
Resource dependencies are separated into start and stop categories. This separation improves and expands the start and stop dependencies between resources and resource types.
This section includes the following topics:
Start Dependencies
Oracle Clusterware considers start dependencies contained in the profile of a resource when the start effort evaluation for that resource begins. You specify start dependencies for resources using the START_DEPENDENCIES
resource attribute. You can use modifiers on each dependency to further configure the dependency.
This section includes descriptions of the following START dependencies:
Related Topics
attraction
If resource A has an attraction
dependency on resource B, then Oracle Clusterware prefers to place resource A on servers hosting resource B. Dependent resources, such as resource A in this case, are more likely to run on servers on which resources to which they have attraction
dependencies are running. Oracle Clusterware places dependent resources on servers with resources to which they are attracted.
You can configure the attraction
start dependency with the following constraints:
-
START_DEPENDENCIES=attraction(intermediate:resourceB)
Use the
intermediate
modifier to specify whether the resource is attracted to resources that are in theINTERMEDIATE
state. -
START_DEPENDENCIES=attraction(type:resourceB.type)
Use the
type
modifier to specify whether the dependency acts on a particular resource type. The dependent resource is attracted to the server hosting the greatest number of resources of a particular type.
Note:
Previous versions of Oracle Clusterware used the now deprecated OPTIONAL_RESOURCES
attribute to express attraction dependency.
dispersion
If you specify the dispersion
start dependency for a resource, then Oracle Clusterware starts this resource on a server that has the fewest number of resources to which this resource has dispersion. Resources with dispersion may still end up running on the same server if there are not enough servers to which to disperse them.
You can configure the dispersion
start dependency with the following modifiers:
-
START_DEPENDENCIES=dispersion(intermediate:resourceB)
Use the
intermediate
modifier to specify that Oracle Clusterware disperses resource A whether resource B is either in theONLINE
orINTERMEDIATE
state. -
START_DEPENDENCIES=dispersion:active(resourceB)
Typically, dispersion is only applied when starting resources. If at the time of starting, resources that disperse each other start on the same server (because there are not enough servers at the time the resources start), then Oracle Clusterware leaves the resources alone once they are running, even when more servers join the cluster. If you specify the
active
modifier, then Oracle Clusterware reapplies dispersion on resources later when new servers join the cluster. -
START_DEPENDENCIES=dispersion(pool:resourceB)
Use the
pool
modifier to specify that Oracle Clusterware disperses the resource to a different server pool rather than to a different server.
exclusion
The exclusion
start dependency contains a clause that defines the exclusive relationship between resources while starting. Resources that have the exclusion
start dependency cannot run on the same node. For example, if resource A has an exclusion
start dependency on resource B, then the CRSD policy provides the following options when resource B is already running on the server where resource A needs to start:
-
Deny the start of resource A if resource B is already running.
-
Start resource A by preempting resource B. There are two variations to the preempt operation:
-
Resource B is stopped and, if possible, restarted on another node. Resource A is subsequently started.
-
Resource A is started first. Subsequently, resource B is stopped and, if possible, restarted on another node.
-
You can configure the exclusion
start dependency with the following modifiers:
-
START_DEPENDENCIES=exclusion([[preempt_pre: | preempt_post:]]
target_resource_name
| type:
target_resource_type
]*)
All modifiers specified are per resource or resource type. Oracle Clusterware permits only one
exclusion
dependency per resource dependency tree. Without anypreempt
modifier, CRSD will only attempt to start the resource if all of its target resources are offline.-
preempt_pre
: If you choose this preempt modifier, then CRSD stops the specified target resource or resources defined by a specific resource type before starting the source resource. If restarting the stopped resources is possible, then CRSD can do this concurrently while starting the preempting resource. -
preempt_post
: If you choose this preempt modifier, then, after starting the source resource, CRSD stops and relocates, if possible, the specified target resource or resources defined by a specific resource type.
If CRSD cannot stop the target resources successfully, or cannot start the source resource, then the entire operation fails. Oracle Clusterware then attempts to return the affected resources to their original state, if possible.
-
hard
Define a hard
start dependency for a resource if another resource must be running before the dependent resource can start. For example, if resource A has a hard
start dependency on resource B, then resource B must be running before resource A can start. Similarly, if both resources (A and B) are initially offline, then resource B is started first to satisfy resource A's dependency.
Note:
Oracle recommends that resources with hard
start dependencies also have pullup
start dependencies.
You can configure the hard
start dependency with the following constraints:
-
START_DEPENDENCIES=hard(global:resourceB)
By default, resources A and B must be located on the same server (collocated). Use the
global
modifier to specify that resources need not be collocated. For example, if resource A has ahard(global:resourceB)
start dependency on resource B, then, if resource B is running on any node in the cluster, resource A can start. -
START_DEPENDENCIES=hard(intermediate:resourceB)
Use the
intermediate
modifier to specify that the dependent resource can start if a resource on which it depends is in either theONLINE
orINTERMEDIATE
state. -
START_DEPENDENCIES=hard(type:resourceB.type)
Use the
type
modifier to specify whether thehard
start dependency acts on a particular resource or a resource type. For example, if you specify that resource A has ahard
start dependency on theresourceB.type
type, then if any resource of theresourceB.type
type is running, resource A can start. -
START_DEPENDENCIES=hard(uniform:resourceB)
Use the
uniform
modifier to attempt to start all instances of resource B, but only one instance, at least must start to satisfy the dependency. -
START_DEPENDENCIES=hard(resourceB, intermediate:resourceC, intermediate:global:type:resourceC.type)
You can combine modifiers and specify multiple resources in the
START_DEPENDENCIES
resource attribute.Note:
Separate modifier clauses with commas. The
type
modifier clause must always be the last modifier clause in the list and thetype
modifier must always directly precede the type.
pullup
Use the pullup
start dependency if resource A must automatically start whenever resource B starts. This dependency only affects resource A if it is not running. As is the case for other dependencies, pullup
may cause the dependent resource to start on any server. Use the pullup
dependency whenever there is a hard stop dependency, so that if resource A depends on resource B and resource B fails and then recovers, then resource A is restarted.
Note:
Oracle recommends that resources with hard
start dependencies also have pullup
start dependencies.
You can configure the pullup
start dependency with the following constraints:
-
START_DEPENDENCIES=pullup(intermediate:resourceB)
Use the
intermediate
modifier to specify whether resource B can be either in theONLINE
orINTERMEDIATE
state to start resource A.If resource A has a
pullup
dependency on multiple resources, then resource A starts only when all resources upon which it depends, start. -
START_DEPENDENCIES=pullup:always(resourceB)
Use the
always
modifier to specify whether Oracle Clusterware starts resource A despite the value of itsTARGET
attribute, whether it isONLINE
orOFFLINE
. By default, without using thealways
modifier,pullup
only starts resources if the value of theTARGET
attribute of the dependent resource isONLINE
. -
START_DEPENDENCIES=pullup(type:resourceB.type)
Use the
type
modifier to specify that the dependency acts on a particular resource type.
weak
If resource A has a weak
start dependency on resource B, then an attempt to start resource A attempts to start resource B, if resource B is not running. The result of the attempt to start resource B is, however, of no consequence to the result of starting resource A.
You can configure the weak
start dependency with the following constraints:
-
START_DEPENDENCIES=weak(global:resourceB)
By default, resources A and B must be collocated. Use the
global
modifier to specify that resources need not be collocated. For example, if resource A has aweak(global:resourceB)
start dependency on resource B, then, if resource B is running on any node in the cluster, resource A can start. -
START_DEPENDENCIES=weak(concurrent:resourceB)
Use the
concurrent
modifier to specify that resource A and resource B can start concurrently. -
START_DEPENDENCIES=weak(type:resourceB.type)
Use the
type
modifier to specify that the dependency acts on a resource of a particular resource type, such asresourceB.type
. -
START_DEPENDENCIES=weak(uniform:resourceB)
Use the
uniform
modifier to attempt to start all instances of resource B.
Stop Dependencies
Oracle Clusterware considers stop dependencies between resources whenever a resource is stopped (the resource state changes from ONLINE
to any other state).
hard
If resource A has a hard
stop dependency on resource B, then resource A must be stopped when B stops running. The two resources may attempt to start or relocate to another server, depending upon how they are configured. Oracle recommends that resources with hard
stop dependencies also have hard
start dependencies.
You can configure the hard
stop dependency with the following modifiers:
-
STOP_DEPENDENCIES=hard(intermedite:resourceB)
Use the
intermediate
modifier to specify whether resource B must be in either theONLINE
orINTERMEDIATE
state for resource A to stay online. -
STOP_DEPENDENCIES=hard(global:resourceB)
Use the
global
modifier to specify whether resource A requires that resource B be present on the same server or on any server in the cluster to remain online. If this constraint is not specified, then resources A and B must be running on the same server. Oracle Clusterware stops resource A when that condition is no longer met. -
STOP_DEPENDENCIES=hard(shutdown:resourceB)
Use the
shutdown
modifier to stop the resource only when you shut down the Oracle Clusterware stack using either thecrsctl stop crs
orcrsctl stop cluster
commands.
Related Topics
Effect of Resource Dependencies on Resource State Recovery
When a resource goes from a running to a non-running state, while the intent to have it running remains unchanged, this transition is called a resource failure. At this point, Oracle Clusterware applies a resource state recovery procedure that may try to restart the resource locally, relocate it to another server, or just stop the dependent resources, depending on the high availability policy for resources and the state of entities at the time.
When two or more resources depend on each other, a failure of one of them may end up causing the other to fail, as well. In most cases, it is difficult to control or even predict the order in which these failures are detected. For example, even if resource A depends on resource B, Oracle Clusterware may detect the failure of resource B after the failure of resource A.
This lack of failure order predictability can cause Oracle Clusterware to attempt to restart dependent resources in parallel, which, ultimately, leads to the failure to restart some resources, because the resources upon which they depend are being restarted out of order.
In this case, Oracle Clusterware reattempts to restart the dependent resources locally if either or both the hard
stop and pullup
dependencies are used. For example, if resource A has either a hard
stop dependency or pullup
dependency, or both, on resource B, and resource A fails because resource B failed, then Oracle Clusterware may end up trying to restart both resources at the same time. If the attempt to restart resource A fails, then as soon as resource B successfully restarts, Oracle Clusterware reattempts to restart resource A.
Resource Placement
As part of the start effort evaluation, the first decision that Oracle Clusterware must make is where to start (or place) the resource. Making such a decision is easy when the caller specifies the target server by name. If a target server is not specified, however, then Oracle Clusterware attempts to locate the best possible server for placement given the resource's configuration and the current state of the cluster.
Oracle Clusterware considers a resource's placement policy first and filters out servers that do not fit with that policy. Oracle Clusterware sorts the remaining servers in a particular order depending on the value of the PLACEMENT
resource attribute of the resource.
The result of this consideration is a maximum of two lists of candidate servers on which Oracle Clusterware can start the resource. One list contains preferred servers and the other contains possible servers. The list of preferred servers will be empty if the value of the PLACEMENT
resource attribute for the resource is set to balanced
or restricted
. The placement policy of the resource determines on which server the resource wants to run. Oracle Clusterware considers preferred servers over possible servers, if there are servers in the preferred list.
Oracle Clusterware then considers the resource's dependencies to determine where to place the resource, if any exist. The attraction and dispersion start dependencies affect the resource placement decision, as do some dependency modifiers. Oracle Clusterware applies these placement hints to further order the servers in the two previously mentioned lists. Note that Oracle Clusterware processes each list of servers independently, so that the effect of the resource's placement policy is not confused by that of dependencies.
Finally, Oracle Clusterware chooses the first server from the list of preferred servers, if any servers are listed. If there are no servers on the list of preferred servers, then Oracle Clusterware chooses the first server from the list of possible servers, if any servers are listed. When no servers exist in either list, Oracle Clusterware generates a resource placement error.
Note:
Neither the placement policies nor the dependencies of the resources related to the resource Oracle Clusterware is attempting to start affect the placement decision.
Related Topics
Registering an Application as a Resource
This section presents examples of the procedures for registering an application as a resource in Oracle Clusterware.
The procedures instruct you how to add an Apache Web server (as an example) as a resource to Oracle Clusterware. The examples in this section assume that the Oracle Clusterware administrator has full administrative privileges over Oracle Clusterware and the user or group that owns the application that Oracle Clusterware is going to manage. Once the registration process is complete, Oracle Clusterware can start any application on behalf of any operating system user.
Oracle Clusterware distinguishes between an owner of a registered resource and a user. The owner of a resource is the operating system user under which the agent runs. The ACL
resource attribute of the resource defines permissions for the users and the owner. Only root
can modify any resource.
Note:
-
Oracle Clusterware commands prefixed with
crs_
are desupported with this release and can no longer be used. CRSCTL commands replace those commands. See "Oracle Clusterware Control (CRSCTL) Utility Reference" for a list of CRSCTL commands and their correspondingcrs_
commands. -
Do not use CRSCTL commands on any resources that have names prefixed with
ora
(because these are Oracle resources), unless My Oracle Support directs you to do so.To configure Oracle resources, use the server control utility, SRVCTL, which provides you with all configurable options.
Creating an Application VIP Managed by Oracle Clusterware
An application VIP is a cluster resource that Oracle Clusterware manages (Oracle Clusterware provides a standard VIP agent for application VIPs).
If clients of an application access the application through a network, and the placement policy for the application allows it to fail over to another node, then you must register a virtual internet protocol address (VIP) on which the application depends. You should base any new application VIPs on this VIP type to ensure that your system experiences consistent behavior among all of the VIPs that you deploy in your cluster.
While you can add a VIP in the same way that you can add any other resource that Oracle Clusterware manages, Oracle recommends using the Grid_home/bin/appvipcfg
command-line utility to create or delete an application VIP on the default network for which the ora.net1.network
resource is created by default.
To create an application VIP, use the following syntax:
appvipcfg create -network=network_nummber -ip=ip_address -vipname=vip_name
-user=user_name [-group=group_name] [-failback=0 | 1]
Note:
You can modify the VIP name while the resource remains online, without restarting the resource.
When you create an application VIP on a default network, set -network=1
.
To create an application VIP on a non-default network, you may have to first create the network using the srvctl add network
command. Then you can create the application VIP, setting -network=
non-default_network_number
.
In an Oracle Flex Cluster, you can also add a non-Hub Node network resource for application VIPs, so that applications can run on non-Hub Nodes using the srvctl add network
command, as follows:
srvctl add network -netnum=network_number -subnet subnet/netmask[/if1[|if2|...]]
To delete an application VIP, use the following syntax:
appvipcfg delete -vipname=vip_name
In the preceding syntax examples, network_number
is the number of the network, ip_address
is the IP address, vip_name
is the name of the VIP, user_name
is the name of the user who installed Oracle Database, and group_name
is the name of the group. The default value of the -failback
option is 0. If you set the option to 1, then the VIP (and therefore any resources that depend on VIP) fails back to the original node when it becomes available again.
Note:
The -ip=ip_address
parameter is required, but if Grid Plug and Play and GNS with DHCP have been configured, the parameter always takes the IP address from the DHCP server and ignores the IP address specified in the command. The value for the -vipname=vip_name
parameter is also ignored with DHCP.
For example, as root
, run the following command:
# Grid_home/bin/appvipcfg create -network=1 -ip=148.87.58.196 -vipname=appsVIP -user=root
The script only requires a network number, the IP address, and a name for the VIP resource, in addition to the user that owns the application VIP resource. A VIP resource is typically owned by root
because VIP related operations require root privileges.
To delete an application VIP, use the same script with the delete
option. This option accepts the VIP name as a parameter. For example:
# Grid_home/bin/appvipcfg delete -vipname=appsVIP
After you have created the application VIP using this configuration script, you can view the VIP profile using the following command:
$ Grid_home/bin/crsctl status res appsVIP -p
Verify and, if required, modify the following parameters using the Grid_home/bin/crsctl modify res
command.
The appvipcfg
script requires that you specify the -network
option, even if -network=1
.
As the Oracle Database installation owner, start the VIP resource:
$ crsctl start resource appsVIP
Adding an Application VIP with Oracle Enterprise Manager
- Log into Oracle Enterprise Manager Cloud Control.
- Select the cluster target that you want to modify.
- From the cluster target menu, select Administration > Resources > Manage.
- Enter a cluster administrator user name and password to display the Manage Resources page.
- Click Add Application VIP.
- Enter a name for the VIP in the Name field.
- Enter a network number in the Network Number field.
- Enter an IP address for the VIP in the Internet Protocol Address field.
- Enter root in the Primary User field. Oracle Enterprise Manager defaults to whatever user name you are logged in as.
- Select Start the resource after creation if you want the VIP to start immediately.
- Click Continue to display the Confirmation: Add VIP Resource page.
- Enter root and the root password as the cluster credentials.
- Click Continue to create the application VIP.
Adding User-Defined Resources
You can add resources to Oracle Clusterware at any time.
However, if you add a resource that depends on another resource, then you must first add the resource upon which it is dependent.
In the examples in this section, assume that an action script, myApache.scr
, resides in the /opt/cluster/scripts
directory on each node to facilitate adding the resource to the cluster. Assume also that a server pool has been created to host an application. This server pool is not a sub-pool of Generic, but instead it is used to host the application in a top-level server pool.
Note:
Oracle recommends that you use shared storage, such as Oracle Automatic Storage Management Cluster File System (Oracle ACFS), to store action scripts to decrease script maintenance.
This section includes the following topics:
Related Topics
Deciding on a Deployment Scheme
You must decide whether to use administrator or policy management for the application. Use administrator management for smaller, two-node configurations, where your cluster configuration is not likely to change. Use policy management for more dynamic configurations when your cluster consists of more than two nodes. For example, if a resource only runs on node 1 and node 2 because only those nodes have the necessary files, then administrator management is probably more appropriate.
Oracle Clusterware supports the deployment of applications in access-controlled server pools made up of anonymous servers and strictly based on the desired pool size. Cluster policies defined by the administrator can and must be used in this case to govern the server assignment with desired sizes and levels of importance. Alternatively, a strict or preferred server assignment can be used, in which resources run on specifically named servers. This represents the pre-existing model available in earlier releases of Oracle Clusterware now known as administrator management.
Conceptually, a cluster hosting applications developed and deployed in both of the deployment schemes can be viewed as two logically separated groups of servers. One server group is used for server pools, enabling role separation and server capacity control. The other server group assumes a fixed assignment based on named servers in the cluster.
To manage an application using either deployment scheme, you must create a server pool before adding the resource to the cluster. A built-in server pool named Generic always owns the servers used by applications of administrator-based management. The Generic server pool is a logical division and can be used to separate the two parts of the cluster using different management schemes.
For third party developers to use the model to deploy applications, server pools must be used. To take advantage of the pre-existing application development and deployment model based on named servers, sub-pools of Generic (server pools that have Generic as their parent pool, defined by the server pool attribute PARENT_POOLS
) must be used. By creating sub-pools that use Generic as their parent and enumerating servers by name in the sub-pool definitions, applications ensure that named servers are in Generic and are used exclusively for applications using the named servers model.
Adding a Resource to a Specified Server Pool
Use the crsctl add resource
command to add a resource to a server pool.
To add the Apache Web server to a specific server pool as a resource using the policy-based deployment scheme, run the following command as the user that is supposed to run the Apache Server (for an Apache Server this is typically the root
user):
$ crsctl add resource myApache -type cluster_resource -attr
"ACTION_SCRIPT=/opt/cluster/scripts/myapache.scr,
PLACEMENT=restricted,
SERVER_POOLS=server_pool_list,
CHECK_INTERVAL=30,
RESTART_ATTEMPTS=2,
START_DEPENDENCIES=hard(appsvip),
STOP_DEPENDENCIES=hard(appsvip)"
In the preceding example, myApache
is the name of the resource added to the cluster.
Note:
-
You must enclose comma or space-delimited attribute values in single quotation marks (' ') to avoid errors. If you enclose single attributes values in single quotation marks, they are ignored and no errors ensue.
-
A resource name cannot begin with a period nor with the character string ora.
The resource is configured, as follows:
-
The resource is a
cluster_resource
type. -
ACTION_SCRIPT=/opt/cluster/scripts/myapache.scr:
The path to the required action script. -
PLACEMENT=restricted
-
SERVER_POOLS=server_pool_list
: This resource can only run in the server pools specified in a space-delimited list. -
CHECK_INTERVAL=30
: Oracle Clusterware checks this resource every 30 seconds to determine its status. -
RESTART_ATTEMPTS=2
: Oracle Clusterware attempts to restart this resource twice before failing it over to another node. -
START_DEPENDENCIES=hard(appsvip)
: This resource has a hard START dependency on the appsvip resource. The appsvip resource must be online in order for myApache to start. -
STOP_DEPENDENCIES=hard(appsvip)
: This resource has a hard STOP dependency on the appsvip resource. The myApache resource stops if the appsvip resource goes offline.
Adding a Resource Using a Server-Specific Deployment
To add the Apache Web server as a resource that uses a named server deployment, assume that you add the resource to a server pool that is, by definition, a sub-pool of the Generic server pool. You create server pools that are sub-pools of Generic using the crsctl add serverpool
command. These server pools define the Generic server pool as their parent in the server pool attribute PARENT_POOLS
. In addition, they include a list of server names in the SERVER_NAMES
parameter to specify the servers that should be assigned to the respective pool. For example:
$ crsctl add serverpool myApache_sp -attr
"PARENT_POOLS=Generic, SERVER_NAMES=host36 host37"
After you create the sub-pool, add the Apache Web server resource, as follows:
$ crsctl add resource myApache -type cluster_resource -attr
"ACTION_SCRIPT=/opt/cluster/scripts/myapache.scr,
PLACEMENT='restricted',
SERVER_POOLS=myApache_sp,
CHECK_INTERVAL='30',
RESTART_ATTEMPTS='2',
START_DEPENDENCIES='hard(appsvip)',
STOP_DEPENDENCIES='hard(appsvip)'"
Note:
A resource name cannot begin with a period nor with the character string ora.
In addition, note that when adding a resource using a server-specific deployment, the server pools listed in the SERVER_POOLS
resource parameter must be sub-pools under Generic.
Related Topics
Creating Resources that Use the generic_application Resource Type
Use the crsctl add resource
command to create resources using the generic_application
resource type to model any type of application requiring high availability without having to create any action scripts.
This section includes two examples for Linux/UNIX platforms of creating resources that use the generic_application
resource type.
In the following command example, a Samba server resource is created for high availability:
$ crsctl add resource samba1 -type generic_application -attr
"START_PROGRAM='/etc/init.d/smb start',
STOP_PROGRAM='/etc/init.d/smb stop',
CLEAN_PROGRAM='/etc/init.d/smb stop',
PID_FILES='/var/run/smbd.pid,/var/run/nmbd.pid'"
In the preceding example, the attributes that define the resource are configured, as follows:
-
START_PROGRAM='/etc/init.d/smb start'
: This attribute contains the complete path and arguments to the script that starts the Samba server -
STOP_PROGRAM='/etc/init.d/smb stop'
: This attribute contains the complete path and arguments to the script that stops the Samba server -
CLEAN_PROGRAM='/etc/init.d/smb stop'
: This attribute contains the complete path and arguments to the script that forcefully terminates and cleans up the Samba server in case there is any failure in starting or stopping the server -
PID_FILES='/var/run/smbd.pid,/var/run/nmbd.pid'
: This attribute contains the paths to the text files listing the process IDs (PIDs) that must be monitored to ensure that the Samba server and all its components are running
Note:
-
If script-based monitoring is required for this Samba server configuration, then you can use the
CHECK_PROGRAMS
attribute instead of thePID_FILES
attribute, as follows:CHECK_PROGRAMS='/etc/init.d/smb status'
-
You can specify standard Oracle Clusterware placement and cardinality properties by configuring the
HOSTING_MEMBERS
,SERVER_POOLS
,PLACEMENT
, andCARDINALITY
attributes of the Samba server resource.
In the second command example, a database file server (DBFS) resource is created for high availability. The DBFS provides a Filesystem in Userspace (FUSE) file system to access data stored in an Oracle Database.
You can use the generic_application
resource type to define a resource that corresponds to the DBFS file system. You can use this DBFS resource to start, stop, monitor, and failover the DBFS file system mount point. The command syntax to create this resource is as follows:
$ crsctl add resource dbfs1 -type generic_application -attr
"START_PROGRAM='/app/oracle/12.2/bin/dbfs_client -o wallet
/@inst1 /scratch/mjk/data/dbfs_mount',
STOP_PROGRAM='/bin/fusermount -u /scratch/mjk/data/dbfs_mount',
CHECK_PROGRAMS='ls /scratch/mjk/data/dbfs_mount/dbfsdata1',
ENVIRONMENT_VARS='ORACLE_HOME=/app/oracle/12.2,
LD_LIBRARY_PATH=/app/oracle/12.2/lib:/app/oracle/12.2/rdbms/lib,
TNS_ADMIN=/app/oracle/12.2/network/admin',
CLEAN_PROGRAM='/bin/fusermount -u -z /scratch/mjk/data/dbfs_mount',
START_DEPENDENCIES='hard(ora.inst1_srv.svc)',
STOP_DEPENDENCIES='hard(ora.inst1_srv.svc)'"
In addition to the mandatory START_PROGRAM
, STOP_PROGRAM
, CHECK_PROGRAMS
, and CLEAN_PROGRAM
attributes, the above example also includes the following:
-
The
ENVIRONMENT_VARS
attribute specifies custom environment variables that are passed when starting or stopping the program -
The
START_DEPENDENCIES
andSTOP_DEPENDENCIES
dependency attributes create a start and stop dependency on the database service that is the underlying database store of the DBFS file systemYou can create dependencies on to the DBFS resource for higher-level application resources based on the application requirements of the DBFS file system.
Note:
-
The
ORACLE_HOME
directory shown in the preceding syntax is an example. -
You can specify standard Oracle Clusterware placement and cardinality properties by configuring the
HOSTING_MEMBERS
,SERVER_POOLS
,PLACEMENT
, andCARDINALITY
attributes of the DBFS file system resource.
Adding Resources Using Oracle Enterprise Manager
Use Enterprise Manager to add resources.
To add resources to Oracle Clusterware using Oracle Enterprise Manager:
Related Topics
Changing Resource Permissions
Oracle Clusterware manages resources based on the permissions of the user who added the resource. The user who first added the resource owns the resource and the resource runs as the resource owner. Certain resources must be managed as root
. If a user other than root
adds a resource that must be run as root
, then the permissions must be changed as root
so that root
manages the resource, as follows:
Application Placement Policies
A resource can be started on any server, subject to the placement policies, the resource start dependencies, and the availability of the action script on that server.
The PLACEMENT
resource attribute determines how Oracle Clusterware selects a server on which to start a resource and where to relocate the resource after a server failure. The HOSTING_MEMBERS
and SERVER_POOLS
attributes determine eligible servers to host a resource and the PLACEMENT
attribute further refines the placement of resources.
The value of the PLACEMENT
resource attribute determines how Oracle Clusterware places resources when they are added to the cluster or when a server fails. Together with either the HOSTING_MEMBERS
or SERVER_POOLS
attributes, you can configure how Oracle Clusterware places the resources in a cluster. When the value of the PLACEMENT
attribute is:
-
balanced
: Oracle Clusterware uses any online server pool for placement. Less loaded servers are preferred to servers with greater loads. To measure how loaded a server is, Oracle Clusterware uses theLOAD
resource attribute of the resources that are in anONLINE
state on the server. Oracle Clusterware uses the sum total of theLOAD
values to measure the current server load. -
favored
: If a value is assigned to either of theHOSTING_MEMBERS
,SERVER_POOLS
, orSERVER_CATEGORY
resource attributes, then that value expresses a preference. IfHOSTING_MEMBERS
is populated and eitherSERVER_POOLS
orSERVER_CATEGORY
is set, thenHOSTING_MEMBERS
indicates placement preference andSERVER_POOLS
orSERVER_CATEGORY
indicates a restriction. For example, theora.cluster.vip
resource has a policy that sets the value ofPLACEMENT
tofavored
,SERVER_CATEGORY
is set toHub
, andHOSTING_MEMBERS
is set toserver_name1
. In this case, Oracle Clusterware restricts the placement ofora.cluster.vip
to the servers in the Hub category and then it prefers the server known asserver_name1
. -
restricted
: Oracle Clusterware only considers servers that belong to server pools listed in theSEVER_POOLS
resource attribute, servers of a particular category as configured in theSERVER_CATEGORY
resource attribute, or servers listed in theHOSTING_MEMBERS
resource attribute for resource placement. Only one of these resource attributes can have a value, otherwise it results in an error.
Unregistering Applications and Application Resources
To unregister a resource, use the crsctl delete resource
command. You cannot unregister an application or resource that is ONLINE or required by another resource, unless you use the -force
option. The following example unregisters the Apache Web server application:
$ crsctl delete resource myApache
Run the crsctl delete resource
command as a clean-up step when a resource is no longer managed by Oracle Clusterware. Oracle recommends that you unregister any unnecessary resources.
Managing Resources
This section includes the following topics:
Registering Application Resources
Each application that you manage with Oracle Clusterware is registered and stored as a resource in OCR.
Use the crsctl add resource
command to register applications in OCR. For example, enter the following command to register the Apache Web server application from the previous example:
$ crsctl add resource myApache -type cluster_resource
-group group_name -attr "ACTION_SCRIPT=/opt/cluster/scripts/myapache.scr, PLACEMENT=restricted,
SERVER_POOLS=server_pool_list,CHECK_INTERVAL=30,RESTART_ATTEMPTS=2,
START_DEPENDENCIES=hard(appsvip),STOP_DEPENDENCIES=hard(appsvip)"
In the preceding example, you can assign the resource to a resource group by specifying the -group
parameter.
If you modify a resource, then update OCR by running the crsctl modify resource
command.
Related Topics
Starting Application Resources
Start resources with the crsctl start resource
command.
Manually starting or stopping resources outside of Oracle Clusterware can invalidate the resource status. In addition, Oracle Clusterware may attempt to restart a resource on which you perform a manual stop operation.
To start an application resource that is registered with Oracle Clusterware, use the crsctl start resource
command. For example:
$ crsctl start resource myApache
The command waits to receive a notification of success or failure from the action program each time the action program is called. Oracle Clusterware can start application resources if they have stopped due to exceeding their failure threshold values. You must register a resource using crsctl add resource
before you can start it.
Running the crsctl start resource
command on a resource sets the resource TARGET
value to ONLINE
. Oracle Clusterware attempts to change the state to match the TARGET
by running the action program with the start
action.
If a cluster server fails while you are starting a resource on that server, then check the state of the resource on the cluster by using the crsctl status resource
command.
Related Topics
Relocating Applications and Application Resources
Use the crsctl relocate resource
command to relocate applications and application resources.
For example, to relocate the Apache Web server application to a server named rac2
, run the following command:
# crsctl relocate resource myApache -n rac2
Each time that the action program is called, the crsctl relocate resource
command waits for the duration specified by the value of the SCRIPT_TIMEOUT
resource attribute to receive notification of success or failure from the action program. A relocation attempt fails if:
-
The application has required resources that run on the initial server
-
Applications that require the specified resource run on the initial server
To relocate an application and its required resources, use the -f
option with the crsctl relocate resource
command. Oracle Clusterware relocates or starts all resources that are required by the application regardless of their state.
You can also relocate a resource group using the crsctl relocate resourcegroup
command, which first stops the resources in the resource group before relocating the resource group on the destination server.
Online Relocation
RELOCATE_KIND=online
), which will start a new resource instance (or several instances for resources belonging to a resource group) on the destination server before stopping it on the original server when you run either the crsctl relocate resource
or crsctl relocate resourcegroup
command.
Note:
Before using online relocation, ensure that the resource can manage the extra resource instances that are started during online relocation.Stopping Applications and Application Resources
Stop application resources with the crsctl stop resource
command.
The command sets the resource TARGET
value to OFFLINE
. Because Oracle Clusterware always attempts to match the state of a resource to its target, the Oracle Clusterware subsystem stops the application. The following example stops the Apache Web server:
# crsctl stop resource myApache
You cannot stop a resource if another resource has a hard stop dependency on it, unless you use the force (-f
) option. If you use the crsctl stop resource resource_name -f
command on a resource upon which other resources depend, and if those resources are running, then Oracle Clusterware stops the resource and all of the resources that depend on the resource that you are stopping.
Displaying Clusterware Application and Application Resource Status Information
Use the crsctl status resource
command to display status information about applications and resources that are on cluster servers.
The following example displays the status information for the Apache Web server application:
# crsctl status resource myApache
NAME=myApache
TYPE=cluster_resource
TARGET=ONLINE
STATE=ONLINE on server010
Other information this command returns includes the following:
-
How many times the resource has been restarted
-
How many times the resource has failed within the failure interval
-
The maximum number of times that a resource can restart or fail
-
The target state of the resource and the normal status information
Use the -f
option with the crsctl status resource resource_name
command to view full information of a specific resource.
Enter the following command to view information about all applications and resources in tabular format:
# crsctl status resource -t
Related Topics
Managing Automatic Restart of Oracle Clusterware Resources
You can prevent Oracle Clusterware from automatically restarting a resource by setting several resource attributes. You can also control how Oracle Clusterware manages the restart counters for your resources. In addition, you can customize the timeout values for the start
, stop
, and check
actions that Oracle Clusterware performs on resources.
This section includes the following topics:
Preventing Automatic Restarts of Oracle Clusterware Resources
To manage automatic restarts, use the AUTO_START
resource attribute to specify whether Oracle Clusterware should automatically start a resource when a server restarts.
When a server restarts, Oracle Clusterware attempts to start the resources that run on the server as soon as the server starts. Resource startup might fail, however, if system components on which a resource depends, such as a volume manager or a file system, are not running. This is especially true if Oracle Clusterware does not manage the system components on which a resource depends.
Note:
Regardless of the value of the AUTO_START
resource attribute for a resource, the resource can start if another resource has a hard or weak start dependency on it or if the resource has a pullup start dependency on another resource.
Related Topics
Automatically Manage Restart Attempts Counter for Oracle Clusterware Resources
When a resource fails, Oracle Clusterware attempts to restart the resource the number of times specified in the RESTART_ATTEMPTS
resource attribute. Note that this attribute does not specify the number of attempts to restart a failed resource (always one attempt), but rather the number of times the resource fails locally, before Oracle Clusterware attempts to fail it over. The CRSD process maintains an internal counter to track how often Oracle Clusterware restarts a resource. The number of times Oracle Clusterware has attempted to restart a resource is reflected in the RESTART_COUNT
resource attribute. Oracle Clusterware can automatically manage the restart attempts counter based on the stability of a resource. The UPTIME_THRESHOLD
resource attribute determines the time period that a resource must remain online, after which the RESTART_COUNT
attribute gets reset to 0. In addition, the RESTART_COUNT
resource attribute gets reset to 0 if the resource is relocated or restarted by the user, or the resource fails over to another server.