Using External Tables to Load Data into a Sharded Database
Using the examples and guidelines in the following topics, you can load data into a sharded database by creating external tables and then loading the data from the external tables into sharded or duplicated tables.
This data loading method is useful when the data to be loaded resides in external files, for example in CSV files.
External tables can be defined using the ORGANIZATION EXTERNAL
keyword
in the CREATE TABLE
statement. This table must be local to each shard
and not sharded or duplicated. Loading the data into the sharded or duplicated table
involves a simple INSERT … SELECT
statement from an external table,
with a condition to filter only a subset of data for sharded tables.
You may choose to keep the files on different hosts based on the access time and size of the files. For example, copy the files for duplicated tables on the shard catalog host and keep files for sharded tables on a network share that is accessible to all of the shards. It is also possible to keep a copy of the sharded table files on each shard for faster loading.
For more information about external tables, see External Tables in Oracle Database Utilities.
- Loading Data into Duplicated Tables
Data for the duplicated tables resides on the shard catalog, so loading the data into the duplicated tables is also done on the shard catalog. The data is then automatically replicated to shards after loading is complete. - Loading Data into Sharded Tables
Loading data into a sharded table needs to be performed on individual shards because data for a sharded table is partitioned across shards. The load can be done concurrently on all the shards, even if the source data file is shared.
Parent topic: Migrating to a Sharded Database
Loading Data into Duplicated Tables
Data for the duplicated tables resides on the shard catalog, so loading the data into the duplicated tables is also done on the shard catalog. The data is then automatically replicated to shards after loading is complete.
Consider the following table defined as a duplicated table.
CREATE DUPLICATED TABLE StockItems (
StockNo NUMBER(4) PRIMARY KEY,
Description VARCHAR2(20),
Price NUMBER(6,2)
);
Loading data into the table StockItems involves the following steps.
Loading Data into Sharded Tables
Loading data into a sharded table needs to be performed on individual shards because data for a sharded table is partitioned across shards. The load can be done concurrently on all the shards, even if the source data file is shared.
The process of loading is similar to the loading of duplicated tables, with an
additional filter in the INSERT … SELECT
statement to filter out
the rows that do not belong to the current shard.
As an example, consider the sharded table created as follows.
CREATE SHARDED TABLE Customers (
CustNo NUMBER(3) NOT NULL,
CusName VARCHAR2(30) NOT NULL,
Street VARCHAR2(20) NOT NULL,
City VARCHAR2(20) NOT NULL,
State CHAR(2) NOT NULL,
Zip VARCHAR2(10) NOT NULL,
Phone VARCHAR2(12),
CONSTRAINT RootPK PRIMARY KEY (CustNo)
)
PARTITION BY CONSISTENT HASH (CustNo)
PARTITIONS AUTO
TABLESPACE SET ts1
;
Loading data into this table involves doing the following steps on each shard.
Repeat the above steps for each sharded table, starting with the root table and descending down the table family hierarchy to maintain any foreign key constraints.