14 External Tables Concepts
The external tables feature is a complement to existing SQL*Loader functionality. It enables you to access data in external sources as if it were in a table in the database.
- How Are External Tables Created?
External tables are created using the SQLCREATE
TABLE...ORGANIZATION EXTERNAL
statement. - Location of Data Files and Output Files
Data files and output files must be located on the server. You must have a directory object that specifies the location from which to read and write files. - Access Parameters for External Tables
To modify the default behavior of the access driver for external tables, specify access parameters. - Data Type Conversion During External Table Use
Conversion errors can occur when external tables are read from and when they are written to.
Parent topic: External Tables
14.1 How Are External Tables Created?
External tables are created using the SQL CREATE
TABLE...ORGANIZATION EXTERNAL
statement.
Note that SQL*Loader may be the better choice in data loading situations that require additional indexing of the staging table. See Behavior Differences Between SQL*Loader and External Tables for more information about how load behavior differs between SQL*Loader and external tables.
As of Oracle Database 12c Release 2 (12.2.0.1), you can partition data contained in external tables, which allows you to take advantage of the same performance improvements provided when you partition tables stored in a database (for example, partition pruning).
See Also:
Oracle Database Administrator’s Guide for additional information about creating and managing external tables, and about partitioning them.
Note:
External tables can be used as inline external tables in SQL statements, thus eliminating the need to create an external table as a persistent database object in the data dictionary. For additional information, see Oracle Database SQL Language Reference.
When you create an external table, you specify the following attributes:
-
TYPE
— specifies the type of external table. Each type of external table is supported by its own access driver.-
ORACLE_LOADER
— this is the default access driver. It loads data from external tables to internal tables. The data must come from text data files. (TheORACLE_LOADER
access driver cannot perform unloads; that is, it cannot move data from an internal table to an external table.) -
ORACLE_DATAPUMP
— this access driver can perform both loads and unloads. The data must come from binary dump files. Loads to internal tables from external tables are done by fetching from the binary dump files. Unloads from internal tables to external tables are done by populating the binary dump files of the external table. TheORACLE_DATAPUMP
access driver can write dump files only as part of creating an external table with the SQLCREATE TABLE AS SELECT
statement. Once the dump file is created, it can be read any number of times, but it cannot be modified (that is, no DML operations can be performed). -
ORACLE_HDFS
— extracts data stored in a Hadoop Distributed File System (HDFS). -
ORACLE_HIVE
— extracts data stored in Apache HIVE.
-
-
DEFAULT
DIRECTORY
— specifies the default directory to use for all input and output files that do not explicitly name a directory object. The location is specified with a directory object, not a directory path. You must create the directory object before you create the external table; otherwise, an error is generated. See Location of Data Files and Output Files for more information. -
ACCESS
PARAMETERS
— describe the external data source and implement the type of external table that was specified. Each type of external table has its own access driver that provides access parameters unique to that type of external table. Access parameters are optional. See Access Parameters. -
LOCATION
— specifies the data files for the external table.-
For
ORACLE_LOADER
andORACLE_DATAPUMP
, the files are named in the formdirectory:file
. Thedirectory
portion is optional. If it is missing, then the default directory is used as the directory for the file. If you are using theORACLE_LOADER
access driver, then you can use wildcards in the file name: an asterisk (*) signifies multiple characters, a question mark (?) signifies a single character. -
For
ORACLE_HDFS
, theLOCATION
clause is a list of Uniform Resource Identifiers (URIs) for a directory or for a file. There is no directory object associated with a URI. -
For
ORACLE_HIVE
, theLOCATION
clause is not used. Instead, the Hadoop HCatalog table is read to obtain information about the location of the data source (which could be a file or another database).
-
The following examples briefly show the use of attributes for each of the access drivers.
Example 14-1 Specifying Attributes for the ORACLE_LOADER Access Driver
The following example uses the ORACLE_LOADER
access driver to show the use of each of these attributes (it assumes that the default directory def_dir1
already exists):
SQL> CREATE TABLE emp_load
2 (employee_number CHAR(5),
3 employee_dob CHAR(20),
4 employee_last_name CHAR(20),
5 employee_first_name CHAR(15),
6 employee_middle_name CHAR(15),
7 employee_hire_date DATE)
8 ORGANIZATION EXTERNAL
9 (TYPE ORACLE_LOADER
10 DEFAULT DIRECTORY def_dir1
11 ACCESS PARAMETERS
12 (RECORDS DELIMITED BY NEWLINE
13 FIELDS (employee_number CHAR(2),
14 employee_dob CHAR(20),
15 employee_last_name CHAR(18),
16 employee_first_name CHAR(11),
17 employee_middle_name CHAR(11),
18 employee_hire_date CHAR(10) date_format DATE mask "mm/dd/yyyy"
19 )
20 )
21 LOCATION ('info.dat')
22 );
Table created.
The information you provide through the access driver ensures that data from the data source is processed so that it matches the definition of the external table. The fields listed after CREATE
TABLE
emp_load
are actually defining the metadata for the data in the info
.dat
source file.
Example 14-2 Specifying Attributes for the ORACLE_DATAPUMP Access Driver
This example creates an external table named inventories_xt
and populates the dump file for the external table with the data from table inventories
in the oe
sample schema.
SQL> CREATE TABLE inventories_xt
2 ORGANIZATION EXTERNAL
3 (
4 TYPE ORACLE_DATAPUMP
5 DEFAULT DIRECTORY def_dir1
6 LOCATION ('inv_xt.dmp')
7 )
8 AS SELECT * FROM inventories;
Table created.
Example 14-3 Specifying Attributes for the ORACLE_HDFS Access Driver
CREATE TABLE sales_external
( time_id DATE NOT NULL, …
amount_sold NUMBER(10,2)
)
ORGANIZATION EXTERNAL
( TYPE ORACLE_HDFS
ACCESS PARAMETERS (com.oracle.bigdata.cluster=hadoop1)
LOCATION (“hdfs:/usr/sales_1.csv”, “hdfs:/usr/my_sales_*.csv”)
)
Example 14-4 Specifying Attributes for the ORACLE_HIVE Access Driver
CREATE TABLE sales_external
( time_id DATE NOT NULL, …
amount_sold NUMBER(10,2)
)
ORGANIZATION EXTERNAL
( TYPE ORACLE_HIVE
ACCESS PARAMETERS (com.oracle.bigdata.cluster=hadoop1
com.oracle.bigdata.tablename=default.ratings_hive_table)
);
Parent topic: External Tables Concepts
14.2 Location of Data Files and Output Files
Data files and output files must be located on the server. You must have a directory object that specifies the location from which to read and write files.
Note:
The information in this section about directory objects does not apply to data files for theORACLE_HDFS
access driver or ORACLE_HIVE
access driver. With the ORACLE_HDFS
driver, the location of data is specified with a list of URIs for a directory or for a file, and there is no directory object associated with a URI. The ORACLE_HIVE
driver does not specify a data source location; it reads the Hive metastore table to get that information, so no directory object is needed.
The access driver runs inside the database server. This behavior is different from SQL*Loader, which is a client program that sends the data to be loaded over to the server. This difference has the following implications:
-
The server requires access to files that the access driver can load.
-
The server must create and write the output files created by the access driver: the log file, bad file, discard file, and also any dump files created by the
ORACLE_DATAPUMP
access driver.
To specify the location from which to read and write files, the access driver requires that you use a directory object. A directory object maps a name to a directory name on the file system. For example, the following statement creates a directory object named ext_tab_dir
that is mapped to a directory located at /usr/apps/datafiles.
CREATE DIRECTORY ext_tab_dir AS '/usr/apps/datafiles';
DBAs or any user can create directory objects with the CREATE
ANY
DIRECTORY
privilege.
Note:
To use external tables in an Oracle Real Applications Cluster (Oracle RAC) configuration, you must ensure that the directory object path is on a cluster-wide file system.
After a directory is created, the user creating the directory object must grant READ
and WRITE
privileges on the directory to other users. These privileges must be explicitly granted, rather than assigned by using roles. For example, to allow the server to read files on behalf of user scott
in the directory named by ext_tab_dir
, the user who created the directory object must execute the following command:
GRANT READ ON DIRECTORY ext_tab_dir TO scott;
The SYS
user is the only user that can own directory objects, but the SYS
user can grant other users the privilege to create directory objects.READ
or WRITE
permission to a directory object means only that Oracle Database reads or writes that file on your behalf. You are not given direct access to those files outside of the Oracle database unless you have the appropriate operating system privileges. Similarly, the Oracle database requires permission from the operating system to read and write files in the directories.
Parent topic: External Tables Concepts
14.3 Access Parameters for External Tables
To modify the default behavior of the access driver for external tables, specify access parameters.
When you create an external table of a particular type, you can specify access parameters to modify the default behavior of the access driver. Each access driver has its own syntax for access parameters. Oracle provides the following access drivers for use with external tables: ORACLE_LOADER
, ORACLE_DATAPUMP
, ORACLE_HDFS
, and ORACLE_HIVE
.
Note:
These access parameters are collectively referred to as the
opaque_format_spec
in the SQL CREATE
TABLE...ORGANIZATION EXTERNAL
statement. The ACCESS
parameter clause allows SQL comments.
See Also:
-
Oracle Database SQL Language Reference for information about specifying
opaque_format_spec
when using the SQLCREATE TABLE
statement
Parent topic: External Tables Concepts
14.4 Data Type Conversion During External Table Use
Conversion errors can occur when external tables are read from and when they are written to.
Conversion Errors When Reading External Tables
When you select rows from an external table, the access driver performs any transformations necessary to make the data from the data source match the data type of the corresponding column in the external table. Depending on the data and the types of transformations required, the transformation can encounter errors.
To illustrate the types of data conversion problems that can occur when reading from an external table, suppose you create the following external table KV_TAB_XT
with two columns: KEY
whose data type is VARCHAR2(4)
and VAL
whose data type is NUMBER
.
SQL> CREATE TABLE KV_TAB_XT (KEY, VARCHAR2(4), VAL NUMBER)
2 ORGANIZATION EXTERNAL
3 (DEFAULT DIRECTORY DEF_DIR1 LOCATION (‘key_val.csv’));
The external table KV_TAB_XT
uses default values for the access parameters, which means:
-
Records are delimited by new lines.
-
The data file and the database have the same character set.
-
The fields in the data file have the same name and are in the same order as the columns in the external table.
-
The data type of the field is
CHAR(255)
. -
Data for each field is terminated by a comma.
The records in the data file for the KV_TAB_XT
external table should be:
-
A string, up to 4 bytes long. If the string is empty, then the value for the field is
NULL
. -
A terminating comma.
-
A string of numeric characters. If the string is empty, then the value for this field is
NULL
. -
An optional terminating comma.
When the access driver reads a record from the data file, it verifies that the length of the value of the KEY
field in the data file is less than or equal to 4, and it attempts to convert the value of the VAL
field in the data file to an Oracle number.
If the length of the value of the KEY
field is greater than 4 or if there is a non-numeric character in the value for VAL
, then the ORACLE_LOADER
access driver rejects the row which results in a copy of the row being written to the bad file and an error message being written to the log file.
All access drivers have to handle conversion from the data type of fields in the source for the external table and the data type for the columns of the external tables. The following are some examples of the types of conversions and checks that access drivers perform:
-
Convert character data from character set used by the source data to the character set used by the database.
-
Convert from character data to numeric data.
-
Convert from numeric data to character data.
-
Convert from character data to a date or timestamp.
-
Convert from a date or timestamp to character data.
-
Convert from character data to an interval data type.
-
Convert from an interval data type to a character data.
-
Verify that the length of data value for a character column does not exceed the length limits of that column.
When the access driver encounters an error doing the required conversion or verification, it can decide how to handle the error. When the ORACLE_LOADER
and ORACLE_DATAPUMP
access drivers encounter errors, they reject the record and write an error message to the log file. It is as if that record were not in the data source. When the ORACLE_HDFS
and ORACLE_HIVE
access drivers encounter errors, the value of the field in which the error is encountered is set to NULL. This is in keeping with the behavior of how Hive handles errors in Hadoop.
Even after the access driver has converted the data from the data source to match the data type of the external table columns, the SQL statement that is accessing the external table could require additional data type conversions. If any of these additional conversions encounter an error, then the entire statement fails. (The exception to this is if you use the DML error logging feature in the SQL statement to handle these errors.) These conversions are the same as any that might normally be required when executing a SQL statement. For example, suppose you change the definition of the KV_TAB_XT
external table to only have columns with character data types, and then you execute an INSERT
statement to load data from the external table into another table that has a NUMBER
data type for column VAL
:
SQL> CREATE TABLE KV_TAB_XT (KEY VARCHAR2(20), VAL VARCHAR2(20))
2 ORGANIZATION EXTERNAL
3 (DEFAULT DIRECTORY DEF_DIR1 LOCATION (‘key_val.csv’));
4 CREATE TABLE KV_TAB (KEY VARCHAR2(4), VAL NUMBER);
5 INSERT INTO KV_TAB SELECT * FROM KV_TAB_XT;
In this example, the access driver will not reject a record if the data for VAL
contains a non-numeric character because the data type of VAL
in the external table is now VARCHAR2
(instead of NUMBER
). However, SQL execution now needs to handle the conversion from character data type in KV_TAB_XT
to number data type in KV_TAB
. If there is a non-numeric character in the value for VAL
in the external table, then SQL raises a conversion error and rolls back any rows that were inserted. To avoid conversion errors in SQL execution, try to make the data types of the columns in the external table match the data types expected by other tables or functions that will be using the values of those columns.
Conversion Errors When Writing to External Tables
The ORACLE_DATAPUMP
access driver allows you to use a CREATE TABLE AS SELECT
statement to unload data into an external table. Data conversion occurs if the data type of a column in the SELECT
expression does not match the data type of the column in the external table. If SQL encounters an error converting the data type, then SQL aborts the statement and the data file will not be readable.
To avoid problems with conversion errors that cause the operation to fail, the data type of the column in the external table should match the data type of the column in the source table or expression used to write to the external table. This is not always possible because external tables do not support all data types. In these cases, the unsupported data types in the source table must be converted into a data type that the external table can support. The following CREATE TABLE
statement shows an example of this:
CREATE TABLE LONG_TAB_XT (LONG_COL CLOB)
ORGANIZATION EXTERNAL...SELECT TO_LOB(LONG_COL) FROM LONG_TAB;
The source table named LONG_TAB
has a LONG
column, therefore the corresponding column in the external table being created, LONG_TAB_XT
, must be a CLOB
and the SELECT
subquery that is used to populate the external table must use the TO_LOB
operator to load the column.
Parent topic: External Tables Concepts