13 Troubleshooting Oracle Sharding

You can enable tracing, locate log and trace files, and troubleshooting common issues.

The following topics describe Oracle Sharding troubleshooting in detail:

Oracle Sharding Tracing and Debug Information

The following topics explain how to enable tracing and find the logs.

Enabling Tracing for Oracle Sharding

Enable PL/SQL tracing to track down issues in the sharded database.

To get full tracing, set the GWM_TRACE level as shown here. The following statement provides immediate tracing, but the trace is disabled after a database restart.

ALTER SYSTEM SET EVENTS 'immediate trace name GWM_TRACE level 7';

The following statement enables tracing that continues in perpetuity, but only after restarting the database.

ALTER SYSTEM SET EVENT='10798 trace name context forever, level 7' SCOPE=spfile;

It is recommended that you set both of the above traces to be thorough.

To trace everything in the Oracle Sharding environment, you must enable tracing on the shard catalog and all of the shards. The traces are written to the RDBMS session trace file for either the GDSCTL session on the shard catalog, or the session(s) created by the shard director (a.k.a. GSM) on the individual shards.

Where to Find Oracle Sharding Alert Logs and Trace Files

There are several places to look for trace and alert logs in the Oracle Sharding environment.

Standard RDBMS trace files located in diag/rdbms/.. will contain trace output.

Output from ‘deploy’ will go to job queue trace files db_unique_name_jXXX_PID.trc.

Output from other GDSCTL commands will go to either a shared server trace file db_unique_name_sXXX_PID.trc or dedicated trace file db_unique_name_ora_PID.trc depending on connect strings used.

Shared servers are typically used for many of the connections to the catalog and shards, so the tracing is in a shared server trace file named SID_s00*.trc.

GDSCTL has several commands that can display status and error information.

Use GDSCTL STATUS GSM to view locations for shard director (GSM) trace and log files.

GDSCTL> status
Alias                     SHARDDIRECTOR1
Version                   18.0.0.0.0
Start Date                25-FEB-2016 07:27:39
Trace Level               support
Listener Log File         /u01/app/oracle/diag/gsm/slc05abw/sharddirector1/alert/log.xml
Listener Trace File       /u01/app/oracle/diag/gsm/slc05abw/sharddirector1/trace/
ora_10516_139939557888352.trc
Endpoint summary          (ADDRESS=(HOST=shard0)(PORT=1571)(PROTOCOL=tcp))
GSMOCI Version            2.2.1
Mastership                N
Connected to GDS catalog  Y
Process Id                10535
Number of reconnections   0
Pending tasks.     Total  0
Tasks in  process. Total  0
Regional Mastership       TRUE
Total messages published  71702
Time Zone                 +00:00
Orphaned Buddy Regions:   None
GDS region                region1
Network metrics:
   Region: region2 Network factor:0

The non-XML version of the alert.log file can be found in the /trace directory as shown here.

/u01/app/oracle/diag/gsm/shard-director-node/sharddirector1/trace/alert*.log

To decrypt log output in GSM use the following command.

GDSCTL> set _event 17 -config_only

Master shard director (GSM) trace/alert files include status and errors on any and all asynchronous commands or background tasks (move chunk, split chunk, deploy, shard registration, Data Guard configuration, shard DDL execution, etc.)

To find pending AQ requests for the shard director, including error status, use GDSCTL CONFIG.

To see ongoing and scheduled chunk movement, use GDSCTL CONFIG CHUNKS -show_reshard

To see shards with failed DDLs, use GDSCTL SHOW DDL -failed_only

To see the DDL error information for a given shard, use GDSCTL CONFIG SHARD -shard shard_name

Common Error Patterns and Resolutions for Sharded Databases

See the following topics for information about troubleshooting common errors in Oracle Sharding.

Shard Director Fails to Start

If you encounter issues starting the shard director, try the following:

To start Scheduler you must be inside ORACLE_HOME on each shard server.

GDSCTL>start gsm -gsm shardDGdirector
GSM-45054: GSM error
GSM-40070: GSM is not able to establish connection to GDS catalog

GSM alert log, /u01/app/oracle/diag/gsm/shard1/sharddgdirector/trace/alert_gds.log
GSM-40112: OCI error. Code (-1). See GSMOCI trace for details.
GSM-40122: OCI Catalog Error. Code: 12514. Message: ORA-12514: TNS:listener does not 
currently know of service requested in connect descriptor
GSM-40112: OCI error. Code (-1). See GSMOCI trace for details.
2017-04-20T22:50:22.496362+05:30
Process 1 in GSM instance is down
GSM shutdown is successful
GSM shutdown is in progress
NOTE : if not message displayed in the GSM log then enable GSM trace level to 16 
while adding GSM itself.
  1. Remove the newly created shard director (GSM) that failed to start.

    GDSCTL> remove gsm -gsm shardDGdirector
    
  2. Add the shard director using trace level 16.

    GDSCTL> add gsm -gsm shardDGdirector -listener port_num -pwd gsmcatuser_password
     -catalog hostname:port_num:shard_catalog_name
     -region region1 -trace_level 16
  3. If the shard catalog database is running on a non-default port (other than 1521), set the remote listener.

    SQL> alter system set local_listener='(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)
    (HOST=hostname)(PORT=port_num)))';

Issues Using Deploy Command

GDSCTL> deploy
GSM-45029: SQL error
ORA-29273: HTTP request failed
ORA-06512: at "SYS.DBMS_ISCHED", line 3715
ORA-06512: at "SYS.UTL_HTTP", line 1267
ORA-29276: transfer timeout
ORA-06512: at "SYS.UTL_HTTP", line 651
ORA-06512: at "SYS.UTL_HTTP", line 1257
ORA-06512: at "SYS.DBMS_ISCHED", line 3708
ORA-06512: at "SYS.DBMS_SCHEDULER", line 2609
ORA-06512: at "GSMADMIN_INTERNAL.DBMS_GSM_POOLADMIN", line 14284
ORA-06512: at line 1

Solution : Check the $ORACLE_HOME/data/pendingjobs for the exact error. ORA-1017 is thrown if any issues on wallet.

  1. On problematic Shard host stop the remote scheduler agent.

    schagent -stop
  2. rename wallet direcotry on Database home

    mv $ORACLE_HOME/data/wallet $ORACLE_HOME/data/wallet.old
  3. start the remote scheduler agent and it will create new wallet directory

    schagent -start 
    schagent -status 
    echo welcome | schagent -registerdatabase 10.10.10.10 8080

Issues Moving Chunks

If you encounter issues with MOVE CHUNK, try the following:

Issue: Initialization parameter remote_dependencies_mode has a default value of timestamp; therefore, because prvtgwmut.plb is run and DBMS_GSM_UTILITY recompiled durning upgrade, GDSCTL MOVE CHUNK runs into ORA-04062 errors similar to the following.

GSM Errors:
server:ORA-03749: Chunk move cannot be performed at this time.
ORA-06512: at "SYS.DBMS_SYS_ERROR", line 79
ORA-06512: at "GSMADMIN_INTERNAL.DBMS_GSM_DBADMIN", line 5497
ORA-04062: timestamp of package "GSMADMIN_INTERNAL.DBMS_GSM_UTILITY" has been
changed
ORA-06512: at line 1
ORA-06512: at "GSMADMIN_INTERNAL.DBMS_GSM_DBADMIN", line 5366
ORA-06512: at line 1 (ngsmoci_execute)

Workaround 1: Restart the source and target shards after upgrade.

Workaround 2: ALTER SYSTEM SET remote_dependencies_mode=signature on both source and target.