11 Sharding JSON Document Collections

Learn how to shard tables of JSON documents using Oracle Sharding with SODA.

Overview of Sharding JSON Documents

Oracle Sharding allows JSON documents to scale to massive data and transactions volume, provide fault isolation, and support data sovereignty. Oracle Database has support for native JSON objects. Applications can interact with the sharded database using the SODA (Simple Oracle Document Access) API, which allows you to access data using JSON document attributes.

In Oracle Database, JSON documents can be stored in a database table. The database tables act as JSON collections, and each row is a JSON document. JSON documents are stored in the database as type BLOB OSON. OSON is Oracle's optimized binary JSON format for query and update in both Oracle Database server and Oracle Database clients.

Although Oracle provides support for JSON operators to create, work with, and retrieve JSON documents, the SODA interface is also supported. SODA provides a more intuitive interface for working with JSON documents.

A SODA document collection consists of an underlying database table and underlying metadata mappings to the JSON documents. In an Oracle Sharding environment these mappings are created manually, because using SODA API commands to create a collection does not shard the collection.

Working with JSON documents in an Oracle Sharding environment introduces the notion of a sharding key. JSON documents are distributed to the individual database table shards according to the sharding key. The sharding key is stored within the JSON document.

For further reading about JSON and SODA, see JSON in Oracle Database and Overview of SODA.

In the topics that follow, we provide details about how to shard JSON objects in Oracle Database. The high level steps are:

  • Deploy a sharded database
  • Identify a sharding key that the application can use to fetch data
  • Define a data store for JSON in Oracle Database by creating sharded tables
  • Map the sharded table with SODA

Then we show you how to:

  • Add documents to the sharded JSON collection in the application
  • Fetch document data from the sharded JSON collection in the application

Prepare the Oracle Sharding Environment

Before you begin configuring Oracle Sharding with SODA, deploy a sharding configuration and start the global services.

A Oracle Sharding sharded database configuration, including shard directors, shard catalog, and shard databases, and any replicas must be deployed. After deploying the sharded database, you must create and start global database services on the shards to service incoming connection requests from your application.

See Sharded Database Deployment for information about creating and deploying a sharded database configuration.

Create an All-Shards User with SODA Privileges

Create a user on the shard catalog that has the privileges to create schema objects in the sharded database, and also has the necessary execute privileges on the DBMS_SODA PL/SQL package.

For the purposes of this document, the user is referred to as the Sharding/SODA user, and the user name is app_schema in the examples.

To create the Sharding/SODA user:

  1. Connect to the shard catalog database (for example, as SYSDBA).
  2. Enable SHARD DDL.
  3. Run CREATE USER command, granting the permissions shown in the example below.

    Note that the Sharding/SODA user is created on the PDB, not the CDB.

The following is an example Sharding/SODA user creation script.

alter session set container=SDBPDB;
alter session enable shard ddl;
create user app_schema identified by password;
 
grant connect, resource, alter session to app_schema;
grant execute on dbms_crypto to app_schema;
grant create table, create procedure, create tablespace, create
materialized view to app_schema;
grant unlimited tablespace to app_schema;
grant select_catalog_role to app_schema;
grant all privileges to app_schema;
grant gsmadmin_role to app_schema;
grant execute on dbms_soda_admin to app_schema;
grant execute on dbms_soda to app_schema;
grant execute on exec_shard_plsql to app_schema;
grant gsm_pooladmin_role to app_schema;

Note the standard database schema privileges and the standard SODA privileges granted to the user. The exec_shard_plsql grant, which gives the user the ability to run PL/SQL procedures on a sharded database, is a sharding-specific privilege required for the Sharding/SODA user.

For more information about Oracle Sharding schema design, including sharding user creation and PL/SQL execution, see Sharded Database Schema Design.

Choose a Sharding Key

Each SODA document has its own ID column in a sharded table, which can be used as the sharding key. You can also choose a JSON field to be the sharding key.

The choice of sharding key is application dependent.

The advantages and disadvantages of each sharding key choice are listed below.

Table 11-1 Sharding Key Options and Their Advantages and Disadvantages

Sharding Key Advantages Disadvantages
SODA ID
  • No triggers required
  • Less extra DDL to manage or clean up
  • Sharding key must be unique
  • Sharding key (SODA ID) must be passed with each document creation
  • SODA filter specifications (also known as, query-by-example (QBE)) must be executed as multi-shard queries on the sharded database
JSON Field
  • More flexibility on sharding key choice; the key does not have to be unique
  • SODA commands need not be concerned with the sharding key value
  • QBEs can work within a single shard, reducing multi-shard queries
  • You must create and manage the trigger and extra column

After choosing which type of sharding key to use, refer to the following use cases to see examples of how to create a sharded table for the JSON collection, and how to interact with the sharded table from an application.

Using SODA ID as the Sharding Key

You can designate the SODA ID as the sharding key when creating the sharded database schema.

The following examples show you how to create a sharded table for the JSON collection, create the SODA mapping, and access the sharded table from an application with Python code samples.

Create a Sharded Table for the JSON Collection

To create a sharded table that uses the SODA ID as the sharding key:

  1. Connect to the shard catalog as the Sharding/SODA user.
  2. Enable SHARD DDL.
  3. Create a tablespace set.
  4. Run CREATE SHARDED TABLE, as shown in the example below.

The following example creates a sharded table (Customers) for a JSON collection of customer profile documents (CUSTPROFILE).

A column for the SODA ID (ID) identifies the JSON entries, and is also used as the primary key and sharding key. When creating a JSON entry in the table with SODA, the application populates the ID column with a unique value.

The other columns are the default column names given when SODA creates a table to hold an underlying collection. You can see this for yourself when creating a SODA collection and then examining the created table.

/* Enable shard DDL */
ALTER SESSION ENABLE SHARD DDL;

/* Create a tablespace set */
CREATE TABLESPACE SET TSP_SET_1 USING TEMPLATE
 (datafile size 100m autoextend on next 10M maxsize unlimited
 extent management local segment space management auto); 

/* Create the sharded table */
CREATE SHARDED TABLE Customers
(
"ID" VARCHAR2(255) NOT NULL,
"CREATED_ON" timestamp default sys_extract_utc(SYSTIMESTAMP) NOT NULL,
"LAST_MODIFIED" timestamp default sys_extract_utc(SYSTIMESTAMP) NOT NULL,
"VERSION" varchar2(255) NOT NULL,
"CUSTPROFILE" JSON,
PRIMARY KEY (ID),
CHECK ("CUSTPROFILE" is json format oson strict))
TABLESPACE SET TSP_SET_1
PARTITION BY CONSISTENT HASH (ID) PARTITIONS AUTO;

Create a SODA Map Across All Shards

Create a map to let SODA know which columns to use when working with the sharded table, and add the sharded table to the list of collections.

You can run a procedure to create the map, but this procedure also must be executed on ALL of the shards in the sharded database. The procedure also needs to be executed on any shards added in the future. You can accomplish both of these requirements using a sharding-specific PL/SQL procedure, sys.exec_shard_plsql().

To create a SODA map across all shards:

As the Sharding/SODA user and with SHARD DDL enabled, run the following commands on the shard catalog. The shard catalog propagates the procedure to all of the shards to be executed automatically.

create or replace procedure COLLECTION_PROC_CUSTOMERS AS
METADATA varchar2(8000);
COL SODA_COLLECTION_T;
begin
METADATA := '{"tableName":"CUSTOMERS",
"keyColumn":{"name":"ID","assignmentMethod" : "CLIENT"},
"contentColumn":{"name":"CUSTPROFILE","sqlType":"JSON"},
"versionColumn":{"name":"VERSION","method":"UUID"},
"lastModifiedColumn":{"name":"LAST_MODIFIED"},
"creationTimeColumn":{"name":"CREATED_ON"},
"readOnly":false}';
 -- Create a collection using "map" mode, based on
 -- the table you've created above and specified in 
 -- the custom metadata under "tableName" field.
COL := dbms_soda.create_collection('CUSTOMERS',METADATA,DBMS_SODA.CREATE_MODE_MAP);
end ;
/
 
exec sys.exec_shard_plsql('app_schema.collection_proc_customers()',4+1);

Note that the keyColumn is ID, the key used by SODA to insert and retrieve collections.

At this point, a new collection has been created just as if you had run a CREATE COLLECTION command.

You can run some PL/SQL to list out the collections. On the shard catalog, run the following commands, and verify that the output lists the Customers table.

SET SERVEROUTPUT ON
DECLARE
l_coll_list SODA_COLLNAME_LIST_T;
BEGIN
l_coll_list := DBMS_SODA.list_collection_names;
 
IF l_coll_list.COUNT > 0 THEN
FOR i IN 1 .. l_coll_list.COUNT LOOP
DBMS_OUTPUT.put_line(i || ' : ' || l_coll_list(i));
END LOOP;
END IF;
END;
/
1 : CUSTOMERS
 
PL/SQL procedure successfully completed.
 
SQL>

Python Code Sample

Here we provide a code sample in Python that shows how you can actually work with JSON objects using SODA in a sharded database environment.

To use this sample code in your environment, follow the instructions to install the cx_Oracle module for Python: https://cx-oracle.readthedocs.io/en/latest/user_guide/installation.html

In this example, we connect to a shard using the sharding key and insert a new document. Note that when using SODA in a sharded database environment, new documents should be created by connecting to specific shards and not using the shard catalog.

# import the cx_Oracle module for Python
import cx_Oracle

# Create a connection pool that will be used for connecting to all shards
# The components of the dsn are hostname (shard director),
# port (usually 1522), global service (created with GDSCTL)
dsn=cx_Oracle.makedsn("shard_director_host",1522,service_name="service_name")
pool=cx_Oracle.SessionPool("app_schema","password",dsn)

# Connect to a specific shard by using the sharding key, which in this example is
# set explicitly with "sodaid", but this might be passed in or part of a loop 
# You must know beforehand if you are creating or working with a document for a specific Customer
# 
sodaid="2468"
connection=pool.acquire(shardingkey=[sodaid])

# Set autocommit and open the CUSTOMERS collection
connection.autocommit = True
soda = connection.getSodaDatabase()
collection = soda.openCollection("CUSTOMERS")

# Insert a document
# Because you are specifying the shard key, you must pass that in with the document (key=custid)

content = {'name': 'Matilda', 'State': 'CA', 'ZIP':'94065'}
soda.createDocument(content, key=sodaid)

# Fetch the document back by key
doc = collection.find().key(sodaid).getOne()
content = doc.getContent()
print('Retrieved SODA document dictionary is:')
print(content)

# After you have finished, release this connection back into the pool
pool.release(connection)
  
# If you want to add or work with more customers, start with another connection
# For example: connection=pool.acquire(shardingkey=["123"]) and so on.
  
#When you are completely finished working with customers you can shut down the pool
pool.close()

In this next code sample, we run a multi-shard query to return all customer names whose names begin with an "M"

import cx_Oracle
  
# Create an unpooled connection to the shard catalog
# The connect string connects to the shard director, but uses the catalog service, e.g. GD$catalog.oradbcloud
connection = cx_Oracle.connect("app_schema","password","db_connect_string")
  
# Open the CUSTOMERS collection
connection.autocommit = True
soda = connection.getSodaDatabase()
collection = soda.openCollection("CUSTOMERS")
  
# Now query the collection
# It is important to note that this is a query across ALL shards
# In other words, you will get ALL users whose names start with M
documents = collection.find().filter({'name': {'$like': 'M%'}}).getDocuments()
for d in documents:
    content = d.getContent()
    print(content["name"])
  
# Close the connection
connection.close()

Using a JSON Field as a Sharding Key

You can designate a JSON field to be the sharding key when creating your sharded database schema.

The following examples show you how to create a sharded table for the JSON collection, create the SODA mapping, trigger the sharding key column population, and access the sharded table from an application with Python code samples.

Create a Sharded Table for the JSON Collection

To create a sharded table that uses a sharding key other than the SODA ID:

  1. Connect to the shard catalog as the Sharding/SODA user.
  2. Enable SHARD DDL.
  3. Create a tablespace set.
  4. Run CREATE SHARDED TABLE, as shown in the example below.

The following example creates a sharded table (Customers) for a JSON collection of customer profile documents (CUSTPROFILE).

A column for the SODA ID (ID) identifies the JSON entries. When creating a JSON entry in the table with SODA, the application populates the ID column with a unique value.

A sharding key column (ZIP) is the ZIP code value extracted from the JSON document.

The other columns are the default column names given when SODA creates a table to hold an underlying collection. You can see this for yourself when creating a SODA collection and then examining the created table.

/* Enable shard DDL */
ALTER SESSION ENABLE SHARD DDL;

/* Create a tablespace set */
CREATE TABLESPACE SET TSP_SET_1 USING TEMPLATE
 (datafile size 100m autoextend on next 10M maxsize unlimited
 extent management local segment space management auto); 

/* Create the sharded table */
CREATE SHARDED TABLE Customers
(
"ID" VARCHAR2(255) NOT NULL,
"CREATED_ON" timestamp default sys_extract_utc(SYSTIMESTAMP) NOT NULL,
"LAST_MODIFIED" timestamp default sys_extract_utc(SYSTIMESTAMP) NOT NULL,
"VERSION" varchar2(255) NOT NULL,
"ZIP" VARCHAR2(60) NOT NULL,
"CUSTPROFILE" JSON,
PRIMARY KEY (ZIP,ID),
CHECK ("CUSTPROFILE" is json format oson strict))
TABLESPACE SET TSP_SET_1
PARTITION BY CONSISTENT HASH (ZIP) PARTITIONS AUTO;

Note that the ID column by itself cannot be the primary key. The PK must be or must include the sharding key, in this case ZIP. In the application examples, both ID and ZIP are used to work with the sharded data. In the example above the PK consists of the sharding key and the SODA ID (ZIP, ID), because ZIP will not be a unique value by itself.

Choosing a good sharding key depends on the usage and application requirements. You can use a unique sharding key, for example a Customer ID, but in that case you could also use the SODA ID to store the sharding key.

Create a SODA Map Across All Shards

Create a map to let SODA know which columns to use when working with the sharded table, and add the sharded table to the list of collections.

You can run a procedure to create the map, but this procedure also must be executed on ALL of the shards in the sharded database. The procedure also needs to be executed on any shards added in the future. You can accomplish both of these requirements using a sharding-specific PL/SQL procedure, sys.exec_shard_plsql().

To create a SODA map across all shards:

As the Sharding/SODA user and with SHARD DDL enabled, run the following commands on the shard catalog. The shard catalog propagates the procedure to all of the shards to be executed automatically.

create or replace procedure COLLECTION_PROC_CUSTOMERS AS
METADATA varchar2(8000);
COL SODA_COLLECTION_T;
begin
METADATA := '{"tableName":"CUSTOMERS",
"keyColumn":{"name":"ID"},
"contentColumn":{"name":"CUSTPROFILE","sqlType":"JSON"},
"versionColumn":{"name":"VERSION","method":"UUID"},
"lastModifiedColumn":{"name":"LAST_MODIFIED"},
"creationTimeColumn":{"name":"CREATED_ON"},
"readOnly":false}';
 -- Create a collection using "map" mode, based on
 -- the table you've created above and specified in 
 -- the custom metadata under "tableName" field.
COL := dbms_soda.create_collection('CUSTOMERS',METADATA,DBMS_SODA.CREATE_MODE_MAP);
end ;
/
 
exec sys.exec_shard_plsql('app_schema.collection_proc_customers()',4+1);

Note that the keyColumn is ID, the key used by SODA to insert and retrieve collections. There is no reference to the ZIP column because it is not used by SODA in the mapping.

At this point, a new collection has been created just as if you had run a CREATE COLLECTION command.

You can run some PL/SQL to list out the collections. On the shard catalog, run the following command, and verify that the output lists the Customers table.

SET SERVEROUTPUT ON
DECLARE
l_coll_list SODA_COLLNAME_LIST_T;
BEGIN
l_coll_list := DBMS_SODA.list_collection_names;
 
IF l_coll_list.COUNT > 0 THEN
FOR i IN 1 .. l_coll_list.COUNT LOOP
DBMS_OUTPUT.put_line(i || ' : ' || l_coll_list(i));
END LOOP;
END IF;
END;
/
1 : CUSTOMERS
 
PL/SQL procedure successfully completed.
 
SQL>

Create a Trigger to Populate the Sharding Key

When SODA inserts a document, it will populate the JSON field and the ID field, and the other fields in the SODA map. However, you also need to populate the ZIP field, and the value must come from within the new JSON document. This is accomplished using a trigger.

Note that this is a BEFORE trigger, which allows you to populate a column even when that column is the primary key.

Run the following statements as the application schema user.

Note:

The following statements must be run on the shard catalog and on EVERY shard in the sharded database.
alter session disable shard ddl;
 
alter session enable shard operations;
 
create or replace procedure COLLECTION_BF_ZIP_CUSTOMERS AS
begin
EXECUTE IMMEDIATE 'alter session disable shard ddl';
EXECUTE IMMEDIATE 'alter session enable shard operations';
EXECUTE IMMEDIATE q'%
Create or Replace TRIGGER CUST_BF_TRIG
BEFORE INSERT or UPDATE on CUSTOMERS
FOR EACH ROW
begin
:new.ZIP := JSON_VALUE(:NEW.CUSTPROFILE, '$.ZIP' error on error error on empty);
end;
%';
end;
/

exec sys.exec_shard_plsql('app_schema.collection_bf_zip_customers()',4+1+2);

In the example above, ZIP is assumed to be a top-level field in the JSON document. If the value is in a nested field, for example under an ADDRESS field, you must include the field hierarchy, for example '$.ADDRESS.ZIP'.

Python Code Sample

Here we provide a code sample in Python that shows how you can actually work with JSON objects using SODA in a sharded database environment.

To use this sample code in your environment, follow the instructions to install the cx_Oracle module for Python: https://cx-oracle.readthedocs.io/en/latest/user_guide/installation.html

In this example, we connect to a shard using the sharding key and insert a new document. Note that when using SODA in a sharded database environment, new documents should be created by connecting to specific shards and not using the shard catalog.

# import the cx_Oracle module for Python
import cx_Oracle
 
# Create a connection pool that will be used for connecting to all shards
# The components of the dsn are hostname (shard director),
# port (usually 1522), global service (created using GDSCTL)
dsn=cx_Oracle.makedsn("shard_director_host",1522,service_name="service_name")
pool=cx_Oracle.SessionPool("app_schema","password",dsn)

# Connect to a specific shard by using the shard key, a ZIP code. which in this
# example is set explicitly as '94065', but this might be passed in or part of a loop 
# You must know beforehand whether you are creating or working with a document
# with a specific ZIP code value.
connection=pool.acquire(shardingkey=["94065"])
 
# set autocommit and open the CUSTOMERS collection
connection.autocommit = True
soda = connection.getSodaDatabase()
collection = soda.openCollection("CUSTOMERS")
 
# Insert a document
# A system generated SODA key is created by default.
content = {'name': 'Matilda', 'STATE': 'CA', 'ZIP': '94065'}
doc = collection.insertOneAndGet(content)
 
# The SODA key can now be used to work with this document directly
key = doc.key
print('The key of the new SODA document is: ', key)
 
# Fetch the document back by SODA key
doc = collection.find().key(key).getOne()
content = doc.getContent()
print('Retrieved SODA document dictionary is:')
print(content)
 
# Now query to return all customers whose names start with 'M'
# First, add another customer. We are in the shard containing 94065, so add a customer with ZIP code '94065'
content = {'name': 'Mildred', 'STATE': 'CA', 'ZIP: '94065'}
doc = collection.insertOneAndGet(content)
 
# Now do a query.
# It is important to note that this query is ONLY executed within this one shard,
# the shard which contains the part of the sharded table with 94065 ZIP codes.
# In other words, the actual query has the additional bound of customers whose names start with 'M' in 94065
documents = collection.find().filter({'name': {'$like': 'M%'}}).getDocuments()
for d in documents:
    content = d.getContent()
    print(content["name"])
 
# After you have finished, release this connection back into the pool
pool.release(connection)
 
# If you want to add or work with more customers with a different
# shard key start with another connection
# For example: connection=pool.acquire(shardingkey=["NY"]) and so on.
 
# When you are completely finished working with customers, shut down the pool.
pool.close()

In this next code sample, we run a multi-shard query to return all customer names in all shards whose names begin with an "M".

import cx_Oracle
 
# Create an unpooled connection to the shard catalog
# The connect string connects to the shard director, but uses the catalog service, e.g. GD$catalog.oradbcloud
connection = cx_Oracle.connect("app_schema","password","db_connect_string")

# Open the CUSTOMERS collection
connection.autocommit = True
soda = connection.getSodaDatabase()
collection = soda.openCollection("CUSTOMERS")
  
# Now query the collection
# It is important to note that this is a query across ALL shards
# In other words, you will get ALL users whose name starts with M across ALL Zip codes
documents = collection.find().filter({'name': {'$like': 'M%'}}).getDocuments()
for d in documents:
    content = d.getContent()
    print(content["name"])
  
#Close the connection
connection.close()

Additional Information About Sharding with SODA

Scaling Out Shards

When adding a new shard:

  • The SODA map does NOT need to be run again. The mapping is updated automatically when a new shard is added to the configuration.
  • The Trigger command DOES need to be run in the new shard as the application schema user.

Local Indexes

Indexes can be created using SQL on each shard for better performance.