Combine Existing Non-Sharded Databases into a Federated Sharded Database
If you have several database installations in different locations that run the same application, and you want to to include the data from all of them, to run data analytics queries for example, you can combine the independent databases into a sharded database to take advantage of Oracle Sharding multi-shard queries.
- Overview
- Creating and Deploying a Federated Sharding Configuration
To deploy a federated sharding environment using existing databases, you define the database layout just as you would for user-defined sharding, usingGDSCTL
commands. - Federated Sharding Reference
Parent topic: Oracle Sharding Solutions
Overview
- About Federated Sharding
Learn what a federated sharding configuration is, why you need it, and how it works. - Federated Sharding Schema Requirements
You can convert existing databases running the same application into a federated sharding configuration, without modifying the database schemas or the application. - Sharded and Duplicated Tables in a Federated Sharding Configuration
Tables that have different sets of data on each of the federated databases are equivalent to the sharded tables in a traditional sharded database. Tables with the same content on all of the federated databases are equivalent to the duplicated tables in a traditional sharded database. - Limitations to Federated Sharding
There are some limitations to creating a federated sharding configuration. - Federated Sharding Security
About Federated Sharding
Learn what a federated sharding configuration is, why you need it, and how it works.
Federated sharding is an Oracle Sharding configuration where the shards consist of independent databases with similar schemas.
Creating a sharded database from independent databases reduces the need to import tons of data into a single location for data analytics.
Consider the following benefits to this approach.
- Create a sharding environment using existing, geographically distributed databases; there is no need to provision new systems.
- Run multi-shard queries; access data from many locations in a single query.
Oracle Sharding, in a federated sharding configuration, treats each independent database as a shard, and as such can issue multi-shard queries on those shards.
You can create a federated sharding configuration with minor version mismatches between the shards. For example, one region could be on Oracle Database 21.1 and another could be on Oracle Database 21.3. All database shards and the shard catalog must be on Oracle Database 21c or later.
Parent topic: Overview
Federated Sharding Schema Requirements
You can convert existing databases running the same application into a federated sharding configuration, without modifying the database schemas or the application.
However, the databases must have the same schema or minor differences. For example, a table can have an extra column in one of the databases.
An application upgrade can trigger changes in the schema, such as when you add a new table, new column, new check constraint, or/and modify a column data type. When part of an overall federated sharding configuration, Oracle Sharding handles the schema differences caused by an application upgrade, as long as the overall schema structure stays the same.
Parent topic: Overview
Sharded and Duplicated Tables in a Federated Sharding Configuration
Tables that have different sets of data on each of the federated databases are equivalent to the sharded tables in a traditional sharded database. Tables with the same content on all of the federated databases are equivalent to the duplicated tables in a traditional sharded database.
When you create the federated sharding configuration, the system assumes that all of the tables are sharded, so you must explicitly mark the tables that must be considered duplicated by the multi-shard query coordinator.
Parent topic: Overview
Limitations to Federated Sharding
There are some limitations to creating a federated sharding configuration.
- There is no concept of chunk in a federated sharding configuration, so
the
GDSCTL MOVE CHUNK
command is not supported. - Application sharding key-based routing is not supported.
- The existing databases, before being added to a federated sharding configuration, must be upgraded to Oracle Database 21c or later.
- DDLs, cross-shard insert, update, and delete are not supported from the shard
catalog in a federated sharding architecture under
ENABLE SHARD DDL
.
Parent topic: Overview
Federated Sharding Security
The database users do not need to exist on all of the federated databases, but the schema owners should exist on all of the databases. The privileges and the passwords of these schema owners can be different. Only common privileges are imported for security.
Parent topic: Overview
Creating and Deploying a Federated Sharding Configuration
To deploy a federated sharding environment using existing databases, you
define the database layout just as you would for user-defined sharding, using
GDSCTL
commands.
The following is a high level description of the process for creating and deploying a federated sharding configuration.
-
Run the
GDSCTL CREATE SHARDCATALOG
command with theFOR_FEDERATED_DATABASE
option to create the federated sharding configuration -
Add shard directors to the configuration.
-
Add a shardspace to the configuration. A shardspace is defined as an existing database and its replica.
-
Add a shard by adding the existing database to the shardspace, then run
DEPLOY
. -
Run
GDSCTL SYNC SCHEMA
to compare the schemas in the federated sharding configuration and retrieve the common shared schemas. UseSYNC SCHEMA
to inspect and apply the DDLs. -
Use
SQL ALTER TABLE
on the shard catalog to convert tables containing the same data across the federated shards to duplicated tables. -
Prepare the shards in the federated sharding configuration for multi-shard queries.
The following topics describe the federated sharding-specific tasks in detail.
- Create the Federated Sharding Configuration
TheGDSCTL
commandCREATE SHARDCATALOG
is used to create the federated sharding configuration, with theFOR_FEDERATED_DATABASE
option used instead of selecting a sharding method in theSHARDING
parameter. - Retrieve, Inspect, and Apply the DDLs
Run theGDSCTL SYNC SCHEMA
command in phases to create the schema objects common to the existing databases in the shard catalog. - Convert Tables to Duplicated Tables
UseALTER TABLE table_name externally duplicated
to mark tables as duplicated in a federated sharding configuration. - Prepare the Shards For Multi-Shard Queries
Create all shard users and use theORA_SHARDSPACE_NAME
pseudo-column to perform queries on specific shards.
Create the Federated Sharding Configuration
The GDSCTL
command CREATE SHARDCATALOG
is
used to create the federated sharding configuration, with the
FOR_FEDERATED_DATABASE
option used instead of selecting a sharding
method in the SHARDING
parameter.
The usage for the GDSCTL
command CREATE SHARDCATALOG
in
creating a federated sharding configuration is similar to how it is used to create the
shard catalog in user-defined sharding, except that instead of specifying a sharding
method in the SHARDING
parameter, you use the
FOR_FEDERATED_DATABASE
option. That is, the
FOR_FEDERATED_DATABASE
option is mutually exclusive with the
SHARDING
option.
CREATE SHARDCATALOG -DATABASE connect_identifier
[-USER username[/password]]
[-REGION region_name_list]
[-CONFIGNAME config_name]
[-AUTOVNCR ON/OFF]
[-FORCE]
[-SDB sdb_name]
[-SHARDSPACE shardspace_name_list]
-FOR_FEDERATED_DATABASE
The CREATE SHARDCATALOG
syntax statement above shows which
parameters are supported. The parameters not shown are not supported when used with the
FOR_FEDERATED_DATABASE
sharding method, for example,
–AGENT_PASSWORD
, REPFACTOR
, and the Oracle Data
Guard protection mode PROTECTMODE
.
Note:
Only Oracle Data Guard replication is supported for federated sharding
configurations. Oracle Sharding doesn't handle the creation and management of the
Data Guard configuration, but you can use Data Guard parameters with the ADD
SHARD
command so that you can add the primary and standbys to see the
status in GDSCTL
.
See Also:
The GDSCTL create shardcatalog topic in Oracle Database Global Data Services Concepts and Administration Guide for usage notes and command options.Retrieve, Inspect, and Apply the DDLs
Run the GDSCTL SYNC SCHEMA
command in phases to create the
schema objects common to the existing databases in the shard catalog.
The GDSCTL SYNC SCHEMA
syntax shown here illustrates the
three phases of the opertion.
sync[hronize] schema
[-schema [schemalist | all] [-retrieve_only] [-restart [-force]]
| -apply [-skip_first]
| -show [[-ddl ddlnum] [-count n] | [-failed_only]]]
SYNC SCHEMA
should be run in phases, as described here.
-
Retrieve Phase
Run
SYNC SCHEMA
with the-retrieve_only
option to inspect and verify the DDLs before they are run on the shard catalog.sync schema -schema schemalist –retrieve_only
When
SYNC SCHEMA
is run without-retrieve_only
, the DDL is retrieved and applied at the same time. -
Inspection Phase
You can examine the DDL statements and their execution status with the
-show
option. The-ddl ddlnum
option shows the specified DDL, and the-count n
option specifies the maximum number of entries to show.sync schema –show -ddl ddlnum -count n
Or you can use the
-failed_only
option to examine only the errored out statements.sync schema –show -failed_only
-
Apply Phase
In the final phase, you run the DDLs on the shard catalog to create the schemas and their objects.
sync schema –apply
If you get an error in the apply phase, there are a couple of ways to work around it:
-
If you can fix the cause of the error, fix and then retry
SYNC SCHEMA -apply
, which retries the failed DDL. -
If the DDL cannot be fixed or it is not required, you can run
SYNC SCHEMA –apply -skip_first
, which resumes the apply phase from the point of the DDL failure.
For security reasons, Oracle Sharding doesn't offer a way to edit the DDLs.
-
-
Import Incremental Changes
If there are changes in the schema at a later point, the previous phases can be run again to import incremental changes. For example, when new objects are added, or a new column is added to a table, which will generate an
ALTER TABLE ADD
statement.
See Also:
The sync schema (synchronize schema) topic in
Oracle Database Global
Data Services Concepts and Administration
Guide for more SYNC
SCHEMA
usage notes and option details.
SYNC SCHEMA Operations for information about the tasks performed by SYNC SCHEMA
Convert Tables to Duplicated Tables
Use ALTER TABLE table_name externally
duplicated
to mark tables as duplicated in a federated sharding
configuration.
Any table created by SYNC SCHEMA
is considered by the
multi-shard query layer as an externally sharded table. If the table contains the
same data on all of the shards, you can alter the table to externally duplicated,
so that the multi-shard query retrieves the data from one shard only, even if it is a
query on a table with no filter predicates on ORA_SHARDSPACE_NAME
.
ALTER TABLE table_name [externally duplicated | externally sharded]
Prepare the Shards For Multi-Shard Queries
Create all shard users and use the
ORA_SHARDSPACE_NAME
pseudo-column to perform queries on specific
shards.
All Shard Users
Before running multi-shard queries from the shard catalog, you must create
all shard users and grant them access to the sharded and duplicated tables.
These users and their privileges should be created in the shard catalog under
shard DDL enabled
.
Create Shardspace-Specific Queries
A shardspace in federated sharding is a set consisting of a primary shard and zero
or more standby shards. To filter query results for a particular shard[space], a
pseudo-column called ORA_SHARDSPACE_NAME
is added to every
externally sharded
table. The value of this pseudo column in the
tables is the name of the shardspace.
Depending on the value of MULTISHARD_QUERY_DATA_CONSISTENCY
, the rows
can be fetched from the primary or from any of the standbys in the shardspace. To run a
multi-shard query on a given shard, you can filter the query with the predicate
ORA_SHARDSPACE_NAME = shardspace_name_shard_belongs_to
.
A query like SELECT CUST_NAME, CUST_ID FROM CUSTOMER
, where
the table CUSTOMER is marked as externally sharded
, runs on all of the
shards.
A query like SELECT CUST_NAME, CUST_ID FROM CUSTOMER WHERE
ora_shardspace_name = ‘EUROPE’
runs on the shards belonging to the
shardspace_name
Europe. Depending on the
MULTISHARD_QUERY_DATA_CONSISTENCY
parameter value, the query is run
on either the primary shard of the shardspace Europe or on its standbys.
You can join sharded tables from different shardspaces. For example, to find the customers from shardspace Europe with orders in shardspace NA, write a query similar to the following.
SELECT order_id, customer_name FROM customers c , orders o WHERE c.cust_id = o.cust_id and
c.ora_shardspace_name = ‘Europe’ and o.ora_shardspace_name = ‘NA’
Querying an externally duplicated
table, with or without
the ORA_SHARDSPACE_NAME
predicate, should go to only one of the
shardspaces. The MULTISHARD_QUERY_DATA_CONSISTENCY
parameter value
determines whether to query a primary shard in the shardspace or its replicas.
Federated Sharding Reference
- SYNC SCHEMA Operations
- Troubleshooting Federated Sharding
Solve common federated sharding issues with these troubleshooting tips.
SYNC SCHEMA Operations
- DDL Synchronization
DDL synchronization is an operation thatSYNC SCHEMA
runs just after the deployment of the shards in a federated sharding configuration. - Import Users
A user or schema is a candidate for import bySYNC SCHEMA
if it exists on all of the shards and owns importable schema objects. - Grant User Roles and Priviledges
For the imported users,SYNC SCHEMA
compares users' privileges. - Import Object Definitions
The objects compared and imported bySYNC SCHEMA
to the shard catalog are the objects that will be referenced in multi-shard queries or used by multi-shard query processing. - Schema Object Comparison
The objects, from one shard to another, can have different definitions.SYNC SCHEMA
compares the different definitions and creates a common definition to enable multi-shard queries against imported objects.
Parent topic: Federated Sharding Reference
DDL Synchronization
DDL synchronization is an operation that SYNC SCHEMA
runs
just after the deployment of the shards in a federated sharding
configuration.
The goal of this operation is to import the object definitions from all of the shards, compare the definitions across the shards, and generate DDLs for the objects that exist on all of the shards (common objects). Once the DDLs are run and the objects are created, you can reference these objects in multi-shard queries.
Parent topic: SYNC SCHEMA Operations
Import Users
A user or schema is a candidate for import by SYNC SCHEMA
if it exists on all of the shards and owns importable schema objects.
You can narrow the list of users to be imported by passing a list of users
in the -SCHEMA
parameter. For example,
gdsctl> sync schema -schema scott
gdsctl> sync schema -schema scott,myschema
For case-sensitive schemas use quoted identifiers.
gdsctl> sync schema -schema "O'Brien",scott
To include all non-Oracle schemas, use the value ALL
in the
SCHEMA
parameter.
gdsctl> sync schema -schema all
Before importing the users, SYNC SCHEMA
verifies that any
discovered users exist on all shards, and no user already exists on the shard catalog
with the same name. The users are then created on the shard catalog as local users and
they are locked. Because these are local users, they only share the same name with
shards and are essentially the same as any other user that may have the same name across
different databases. Note that these users are not able to login and issue queries
because they are not all shard users. To issue multi-shard queries, an all shard
user must be created.
Note:
Only users local to a PDB are imported. Common CDB users are not imported.Parent topic: SYNC SCHEMA Operations
Grant User Roles and Priviledges
For the imported users, SYNC SCHEMA
compares users'
privileges.
SYNC SCHEMA
grants only the privileges that are granted on
all of the shards (common grants). A user A who has a DBA role on
shard1, but does not have DBA role on shard2, is not granted the DBA role in
the shard catalog.
Parent topic: SYNC SCHEMA Operations
Import Object Definitions
The objects compared and imported by SYNC SCHEMA
to the
shard catalog are the objects that will be referenced in multi-shard queries or used by
multi-shard query processing.
These objects are:
- Tables
- Views and Materialized Views (exported as tables)
- Check Constraints
- Object Types
- Synonyms
Running SYNC SCHEMA
does not import objects related to
storage, or objects that have no impact on multi-shard query processing, such as
tablespaces, indexes, indextypes, directories, or zone maps.
Parent topic: SYNC SCHEMA Operations
Schema Object Comparison
The objects, from one shard to another, can have different definitions.
SYNC SCHEMA
compares the different definitions and creates a common
definition to enable multi-shard queries against imported objects.
SYNC SCHEMA
detects the objects' differences at two levels:
number of objects, and object definitions.
First, SYNC SCHEMA
considers the number of objects. It is
likely that, during an application upgrade, some objects are added to the schemas. Only
objects that are on all of the shards will be imported into the shard catalog.
Second, the object definitions from one shard to another can have different
attributes. For the objects that SYNC SCHEMA
imports, the following
differences are noted:
Parent topic: SYNC SCHEMA Operations
Differences in Tables
When comparing objects in a federated sharding configuration, some differences in tables have an impact on multi-shard queries and some do not.
Column Differences
Only column differences have an impact on multi-shard queries. SYNC
SCHEMA
addresses only this difference.
-
The number of columns can be different.
-
The data type of a given column can be different.
-
The default value of a given column can be different.
-
The expression of a virtual column can be different
When a table has a different numbers of columns, SYNC
SCHEMA
will opt for the creation of a table that contains the union of all
of the columns. Taking the union of all of the columns, compared to just taking the
intersection, will spare you from re-writing multi-shard queries in case of an
incremental deploy, when the added shard has fewer columns than indicated in the shard
catalog.
When a column has different data types, SYNC SCHEMA
defines
it as the highest (largest) datatype.
When a column has different data types, and one of the columns is a user-defined object type, then that column is not imported into the shard catalog.
When a column has different default values, SYNC SCHEMA
sets NULL
as the default value.
Nested table columns are not imported into the shard catalog.
Example: a Customer table is defined on shard1 and shard2 as shown here.
On shard1:
Customer( Cust_id number, Name varchar(30),
Address varchar(50),Zip_code number)
On shard2:
Customer( Cust_id varchar(20), Name varchar(30),
Address varchar(50),Zip_code number,
Country_code number)
Note that the column Cust_id
is a number on shard1 and
a varchar(20) on shard2. Also, note that Country_code
exists on
shard2 but does not exist on shard1.
The Customer table created by SYNC SCHEMA
in the shard
catalog has all of the columns, including Country_code
, and the
Cust_id
type is varchar(20).
Customer( Cust_id varchar(20), Name varchar(30),
Address varchar(50),Zip_code number,
Country_code number)
SYNC SCHEMA
keeps track of these differences between
schemas in the shard catalog. A query issued on the catalog database that accesses
these heterogeneous columns is rewritten to address the differences before it is
sent to the shards. On the shard, if there is a data type mismatch, the data is CAST
into the "superior" data type as created on the catalog. If the column is missing on
the shard, the default value is returned as set on the catalog.
Partition Scheme Differences
Note that this difference has no impact on multi-shard queries, and is ignored.
- Partitioning column can be different.
- Partition type can be different.
- Number of partitions can be different.
Storage Attribute Differences
Note that this difference has no impact on multi-shard queries, and is ignored.
- Tablespaces, on which the table is created, are different.
- The encryption can be different.
- The
INMEMORY
attribute can be different.
Parent topic: Schema Object Comparison
Differences in Views
Views on shards are created and handled as tables in the shard catalog. The same restrictions that apply to tables also apply to views.
Parent topic: Schema Object Comparison
Differences in Constraints
Only CHECK constraints are created in the shard catalog. The CHECK constraint condition should be same on all of the shards.
Parent topic: Schema Object Comparison
Differences in Object Types
Object types and type bodies are only created if they have the same definition on all of the shards.
Parent topic: Schema Object Comparison
Troubleshooting Federated Sharding
Solve common federated sharding issues with these troubleshooting tips.
ORA-03851: Operation not supported in federated database
ORA-03701: Invalid parameter combination: federated database and ...
Some of the operations and command options that apply to a traditional sharded database are not applicable to a federated database. This is because:
-
There is no concept of a chunk in a federated database. Any chunk-related operation is invalid, for example
SPLIT CHUNK
andMOVE CHUNK
. -
The Data Guard broker configuration is not set up or managed by the system in federated database, because the existing shards may already have been set up with their own high availability configurations. Operations such as
SET DATAGUARD_PROPERTY
orMODIFY SHARDSPACE
are not supported. -
Oracle GoldenGate configuration is not supported.
-
The
CREATE SHARD
command is not supported.
ORA-03885: Some primary shards are undeployed or unavailable
The SYNC SCHEMA
operation requires that all primary
shards be available. Check the output of the CONFIG
SHARD
command, and check the status of all primary
shards. Fix any issues and retry the operations when the shards
become available.
ORA-03871: Some DDL statements are not applied to the catalog
The SYNC SCHEMA
operation cannot import object
definitions from the shards when some statements from the previous
execution are still not applied on the shard catalog. Run
SYNC SCHEMA
with the
-apply
option to run these statements.
Handling Errors During Multi-Shard Queries
If a multi-shard query fails with this error due to a mismatch of the
object definition on the shard and the catalog, make sure that the
shard catalog has the latest schema changes imported. Any time there
are schema changes in the federated database, you must run
SYNC SCHEMA
to import any changes in the
schemas on the shards.
Note that subsequent runs of SYNC SCHEMA
will not drop and recreate the object, but will generate
ALTER
statements to incorporate the
definition changes. This ensures that if there are queries already
running during the SYNC SCHEMA
operation, they
won't fail with invalid object errors.
Handling Errors During DDL Execution Phase
If DDL execution fails on the shard catalog, the status of each DDL can
be examined with the SYNC SCHEMA
-show
option.
gdsctl> sync schema -show
Note: The SYNC SCHEMA -show
command is different
from the command SHOW DDL
. SHOW
DDL
lists DDL statements run by an all-shard user
that are first run on the catalog and then propagated to the shards,
whereas SYNC SCHEMA -show
DDL statements are
generated from the objects imported from shards.
By default, SYNC SCHEMA -show
lists a fixed number of
the latest DDLs. The -count
and
-ddl
options can be used to inspect
specific range of DDLs. For example,
gdsctl> sync schema -show -count 20
gdsctl> sync schema -show -count 20 -ddl 5
To check the complete DDL text and error message, if any, use the
-ddl
option.
gdsctl> sync schema -show -ddl 5
To list only the failed DDL statements, use the
-failed_only
option.
gdsctl> sync schema –failed_only
Based on the error message of the failed DDL, fix the cause of the error and perform the apply phase.
gdsctl> sync schema -apply
The SYNC SCHEMA
command also has a
-restart
option to perform the complete
operation from the beginning as if it were run for the first time.
This option will DROP
all existing schemas imported
during all previous executions of SYNC SCHEMA
and
any related metadata. Be aware that this will cause any running
queries on these objects to fail.
gdsctl> sync schema -restart
Parent topic: Federated Sharding Reference