18 ORACLE_BIGDATA Access Driver
With the ORACLE_BIGDATA
access driver, you can access data
stored in object stores as if that data was stored in tables in an Oracle
Database.
ORACLE_BIGDATA
currently supports access to Oracle
Object Store, Amazon S3, and Azure Blob Storage. You can also use this
driver to query local data, which is useful for testing and smaller data
sets.
- Using the ORACLE_BIGDATA Access Driver
You can use theORACLE_BIGDATA
driver to access data located in external object stores. - How to Create a Credential for Object Stores
Credential objects enable you to access an external object store. - Object Store Access Parameters
You can use these access parameters to specify properties about the files residing in object stores.
Related Topics
Parent topic: External Tables
18.1 Using the ORACLE_BIGDATA Access Driver
You can use the ORACLE_BIGDATA
driver to access data
located in external object stores.
There are two steps required to access data in an object store:
- Create a credential object (not required for public buckets).
A credential object stores object store credentials in an encrypted format. The identity specified by the credential must have access to the underlying data in the object store.
- In-line external tables are supported. These external tables are simply
expressed as part of a query.
Create an external table or query using an in-line external table. The access driver type must be
ORACLE_BIGDATA
. TheCREATE TABLE
statement must reference the credential object, which provides authentication to access the object store. The table you create also requires aLOCATION
clause, which provides the URI to the files within the object store.For public buckets, the
CREDENTIAL
is not required.
Parent topic: ORACLE_BIGDATA Access Driver
18.2 How to Create a Credential for Object Stores
Credential objects enable you to access an external object store.
To create your credential object, use either the
DBMS_CREDENTIAL.CREATE_CREDENTIAL
or
DBMS_CLOUD.CREATE_CREDENTIAL
. This object contains
the username and password information needed to access the object store.
This credential password must match the authentication token (auth token) created for the username in your
cloud service.
Note:
You must have theDBMS_CLOUD
package installed.
- Creating the Credential Object with DBMS_CREDENTIAL.CREATE_CREDENTIAL
TheDBMS_CLOUD
subprogramDBMS_CREDENTIAL.CREATE_CREDENTIAL
enables you to authenticate access to an external object store. - Creating the Credential Object with DBMS_CLOUD.CREATE_CREDENTIAL
TheDBMS_CLOUD
subprogramDBMS_CLOUD.CREATE_CREDENTIAL
enables you to authenticate access to an external object store. - How to Define the Location Clause for Object Storage
Use these examples to see how you can specify the object store URI, depending on its source. - Understanding ORACLE_BIGDATA Access Parameters
To useORACLE_BIGDATA
, you provide information in an access parameter to indicate how to access and parse the data.
Related Topics
Parent topic: ORACLE_BIGDATA Access Driver
18.2.1 Creating the Credential Object with DBMS_CREDENTIAL.CREATE_CREDENTIAL
The DBMS_CLOUD
subprogram
DBMS_CREDENTIAL.CREATE_CREDENTIAL
enables you to authenticate access to
an external object store.
These examples show how to use
DBMS_CREDENTIAL.CREATE_CREDENTIAL
.
Example 18-1 Cloud Service Credentials
In the following example, my_credential
is the Oracle Cloud Infrastructure user
name, username
is the account
username, password
is the Oracle
Cloud Infrastructure auth token:
execute dbms_credential.create_credential(
credential_name => 'my_credential',
username => 'username',
password => 'password'
);
Example 18-2 Native Oracle Cloud Infrastructure Credentials
In the following example, my_credential
is the Oracle Cloud Infrastructure user
name, user_ocid
is the Oracle Cloud
Identifier (OCID), tenancy_ocid
is
the Oracle Cloud tenancy identifier, private_key
is the SSH private key, and fingerprint
is the public key fingerprint:
execute dbms_credential.create_credential(
credential_name => 'my_credential',
username => 'user_ocid',
password => '',
key => '{"tenancy_ocid":"tenancy_ocid","private_key":"private_key","fingerprint":"fingerprint"}');
After you create the credential, specify the credential object name in the parameter com.oracle.bigdata.credential.name, At the time of this release, the credential must be in the same schema as the table
Related Topics
Parent topic: How to Create a Credential for Object Stores
18.2.2 Creating the Credential Object with DBMS_CLOUD.CREATE_CREDENTIAL
The DBMS_CLOUD
subprogram
DBMS_CLOUD.CREATE_CREDENTIAL
enables you to authenticate access to an
external object store.
These examples show how to use
DBMS_CLOUD.CREATE_CREDENTIAL
.
Example 18-3 Native Oracle Cloud Infrastructure Credentials
In the following example, my_credential
is the Oracle Cloud Infrastructure user
name, password
is the Oracle Cloud
Infrastructure auth token, user_ocid
is the Oracle Cloud Identifier (OCID),
tenancy_ocid
is the Oracle
Cloud tenancy identifier, private_key
is the SSH private key, and fingerprint
is the public key fingerprint:
execute dbms_cloud.create_credential DBMS_CLOUD.CREATE_CREDENTIAL(
credential_name => 'my_credential',
username => 'user_ocid@example.com',
password => 'password'
key => '{"tenancy_ocid":"tenancy_ocid","private_key":"private_key","fingerprint":"fingerprint"}');
After you create the credential, specify the credential object name in the parameter com.oracle.bigdata.credential.name. At the time of this release, the credential must be in the same schema as the table.
Related Topics
Parent topic: How to Create a Credential for Object Stores
18.2.3 How to Define the Location Clause for Object Storage
Use these examples to see how you can specify the object store URI, depending on its source.
LOCATION
is a URI pointing to data in the object store.
Currently supported object stores are Oracle Object Store, Amazon S3
and Azure Blob Storage.
In the examples, the following variables are used:
region
– tenancy regionhost
– a server host nameport
– a port number assigned to the service, listening on a hostcontainer
– name of a container resourcenamespace
– namespace in a regionbucket
– a globally unique name for a resourceobjectname
– a unique identifier for an object in a bucketfilename
– object store filename
Note the following prerequisites for defining the location:
-
The credential object is required for private object store access. If the credential parameter is omitted, then the object must be in a public bucket.
The user ID associated with this credential must have access to read the data from object storage.
-
If you are testing access for data in object storage using local storage, then you must specify an Oracle directory object in the location, similar to what you do for
ORACLE_LOADER
data sources.
Example 18-4 Native Oracle Cloud Infrastructure Object Storage
location ('https://objectstorage.region.oraclecloud.com/n/namespace/b/bucket/o/objectname')
Example 18-5 Oracle Cloud Infrastructure Object Storage
location ('https://swiftobjectstorage.region.oraclecloud.com/v1/namespace/bucket/filename'
Example 18-6 Hosted-Style URI format
location ('https://bucket.host/objectname')
Example 18-7 Path-style URI Format
location ('https://host/bucket/objectname')
For example, an Amazon path style URI can take the following format:
location ('https://s3-us-west-2.amazonaws.com/adwc/filename')
Example 18-8 Azure BLOB Storage Location Format
location ('https://host:port/container/blob')
For example, an Azure path style URI can take the following format:
location ('https://exampleacount.blob.core.windows.net/examplecontainer/exampleblob')
Parent topic: How to Create a Credential for Object Stores
18.2.4 Understanding ORACLE_BIGDATA Access Parameters
To use ORACLE_BIGDATA
, you provide information in an
access parameter to indicate how to access and parse the data.
csv
, textfile
, avro
,
parquet
, or
orc
:com.oracle.bigdata.fileformat=[csv|textfile|avro|parquet|orc]
You can also use ORACLE_BIGDATA
to access local files for testing, or
for simple querying. In this case, the LOCATION
field value is the same
as what you would use for ORACLE_LOADER
. You can use an Oracle
directory object followed by the name of the file in the LOCATION
field. For local files, a credential object is not required. However, you must have
privileges over on the directory object in order to access the file.
Parent topic: How to Create a Credential for Object Stores
18.3 Object Store Access Parameters
You can use these access parameters to specify properties about the files residing in object stores.
- Syntax Rules for Specifying Properties
The properties are set using keyword-value pairs in the SQLCREATE TABLE
ACCESS PARAMETERS
clause and in the configuration files. - com.oracle.bigdata.fileformat
Describes the row format of the data source, based on theROW FORMAT
clause for a Hive table generated byORACLE_HDFS
. - ORACLE_BIGDATA Access Parameters
There is a set of access parameters that are common to all file formats. There are also parameters that are unique to a specific file format. - GATHER_EXTERNAL_TABLE_STATS
This is the PL/SQL interface for manually gathering statistics on external tables (ORACLE_HDFS
,ORACLE_HIVE
,ORACLE_BIGDATA
).
Parent topic: ORACLE_BIGDATA Access Driver
18.3.1 Syntax Rules for Specifying Properties
The properties are set using keyword-value pairs in the SQL CREATE TABLE
ACCESS PARAMETERS
clause and in the configuration files.
The syntax must obey these rules:
-
The format of each keyword-value pair is a keyword, a colon or equal sign, and a value. The following are valid keyword-value pairs:
keyword=value keyword:value
The value is everything from the first non-whitespace character after the separator to the end of the line. Whitespace between the separator and the value is ignored. Trailing whitespace for the value is retained.
-
A property definition can be on one line or multiple lines.
-
A line terminator is a line feed, a carriage return, or a carriage return followed by line feeds.
-
When a property definition spans multiple lines, then precede the line terminators with a backslash (escape character), except on the last line. In this example, the value of the
Keyword1
property isValue part 1 Value part 2 Value part 3
.Keyword1= Value part 1 \ Value part 2 \ Value part 3
-
You can create a logical line by stripping each physical line of leading whitespace and concatenating the lines. The parser extracts the property names and values from the logical line.
-
You can embed special characters in a property name or property value by preceding a character with a backslash (escape character), indicating the substitution. The following table describes the special characters.
Table 18-1 Special Characters in Properties
Escape Sequence | Character |
---|---|
|
Backspace ( |
|
Horizontal tab ( |
|
Line feed ( |
|
Form feed ( |
|
Carriage return ( |
|
Double quote ( |
|
Single quote ( |
|
Backslash ( When multiple backslashes are at the end of the line, the parser continues the value to the next line only for an odd number of backslashes. |
|
2-byte, big-endian, Unicode code point. When a character requires two code points (4 bytes), the parser expects |
Parent topic: Object Store Access Parameters
18.3.2 com.oracle.bigdata.fileformat
Describes the row format of the data source, based on the ROW
FORMAT
clause for a Hive table generated by
ORACLE_HDFS
.
Default Value
TEXTFILE
Syntax
A JSON document with the keyword-value pairs is shown in the following diagram.
fileformat::=
Semantics
ORC
Optimized row columnar file format
PARQUET
Column-oriented, binary file format
RCFILE
Record columnar file format
SEQUENCEFILE
Compressed file format
TEXTFILE
Plain text file format
INPUTFORMAT
Identifies a Java class that can extract records from the data file.
OUTPUTFORMAT
Identifies a Java class that can format the output records in the desired format
Parent topic: Object Store Access Parameters
18.3.3 ORACLE_BIGDATA Access Parameters
There is a set of access parameters that are common to all file formats. There are also parameters that are unique to a specific file format.
Common Access Parameters
The following table lists parameters that are common to all file formats
accessed through ORACLE_BIGDATA
. The first column identifies each
access parameter common to all data file types. The second column describes each
parameter.
Table 18-2 Common Access Parameters
Common Access Parameter | Description |
---|---|
|
Specifies the credential object to use when accessing data files in an object store. This access parameter is required for object store access. It is not needed for access to files through a directory object or for data stored in public buckets. The name specified for the credential must be the
name of a credential object in the same schema as the owner of
the table. Granting a user Use
DBMS_CREDENTIAL.CREATE_CREDENTIAL in the
DBMS_CREDENTIAL PL/SQL package to create
the credential
object:
In the
|
|
Specifies the format of the file. The value of this parameter identifies the reader that will process the file. Each reader can support additional access parameters that may or may not be supported by other readers. Valid values:
Default:
|
|
Specifies whether log messages should be written to
a log file. When Valid values:
Default:
|
|
Specifies the name of the log file created by the
parallel query coordinator. This parameter is used only when
|
|
Specifies the name of the log file created during
query execution. This value is used (and is required) only when
Valid values:
Default:
|
Avro-Specific Access Parameters
In addition to common access parameters, there are some that are only valid for the Avro file format. The first column in this table identifies the access parameters specific to the Avro file format and the second column describes the parameter. There is only one Avro-specific parameter at this time.
Table 18-3 Avro-Specific Access Parameters
Avro-Specific Parameter | Description |
---|---|
|
Specifies the representation of a decimal stored in the byte array. Valid values:
Default: If this parameter is not used, an
Avro decimal column is read assuming byte arrays store the
numerical representation of the values (that is default to
|
Parquet-Specific Access Parameters
Some access parameters are only valid for the Parquet file format. The first column in this table identifies the access parameters specific to the Parquet file format and the second column describes the parameter.
Table 18-4 Parquet-Specific Access Parameters
Parquet-Specific Access Parameter | Description |
---|---|
|
This is a Boolean property that specifies if the binary is stored as a string. Valid values:
Default:
|
|
This is a Boolean property that specifies if
Valid values:
Default:
|
Textfile and CSV-Specific Access Parameters
The text file and comma-separated value (csv
) file
formats are similar to the hive text file format. It reads text and
csv
data from delimited files. ORACLE_BIGDATA
automatically detects the line terminator (either \n
,
\r
, or \r\n
). By default, it assumes the
fields in the file are separated by commas, and the order of the fields in the file
match the order of the columns in the external table.
Example 18-9 CSV Data File
This is a simple csv
example. The data file has
comma-separated values, with optional enclosing quotes.
–----Source csv data in t.dat
t.dat:
1,"abc",
2,xyx,
–---------Create an external table over the csv source data in t.dat
CREATE TABLE t
(
c0 number,
c1 varchar2(20)
)
ORGANIZATION external
(
TYPE oracle_bigdata
DEFAULT DIRECTORY DMPDIR
ACCESS PARAMETERS
(
com.oracle.bigdata.fileformat=csv
)
location
(
't.dat'
)
)REJECT LIMIT 1
;
–------Select data from external table
select c0, c1 from t;
C0 C1
---- -----
1 abc
2 xyz
Example 18-10 CSV Data File
This example shows how to create an external table over a
csv
data source, which has '|
' as the field
separator, the data file compressed with gzip
, blanks as null, and
a date format.
–----The source csv data in t.dat
t.dat:
1| |
2|Apr-99-30|
–------Create an external table over the csv data source in t.dat
CREATE TABLE t(
c0 number,
c1 date
)
ORGANIZATION external
(
TYPE oracle_bigdata
DEFAULT DIRECTORY DMPDIR
ACCESS PARAMETERS
(
com.oracle.bigdata.fileformat=textfile
com.oracle.bigdata.compressiontype=gzip
com.oracle.bigdata.csv.rowformat.separatorcharacter='|'
com.oracle.bigdata.blankasnull=true
com.oracle.bigdata.dateformat="MON-RR-DD HH:MI:SS"
)
location
(
't.dat.gz'
)
)REJECT LIMIT 1
;
--Select csv data from external table
QL> select c0, c1 from t;
C0 C1
------ ---------
1
2 30-APR-99
Example 18-11 JSON Data File
{"id":"72","name":"George","lat":40.76727216,"lon":-73.99392888,"segments":["wealthy","middle-aged"],"age":45}
{"id":"79","name":"Lucy","lat":40.71911552,"lon":-74.00666661,"segments":["married","yes"],"age":33}
— Create the external table over Json source
CREATE TABLE people (
doc varchar2(4000)
)
ORGANIZATION EXTERNAL (
TYPE ORACLE_BIGDATA
DEFAULT DIRECTORY DEFAULT_DIR
ACCESS PARAMETERS(
com.oracle.bigdata.credential.name=MY_CRED
com.oracle.bigdata.fileformat=textfile
com.oracle.bigdata.csv.rowformat.fields.terminator='\n’ <— notice the
delimiter is a new line (that is, not a comma). So, it will read to the end of the line and get
the whole document.
)
LOCATION ('https://swftobjectstorage.us-phoenix-1.oraclecloud.com/v1/mybucket/people.json')
)
REJECT LIMIT UNLIMITED;
–---Select data from external table
select s.doc.id,
s.doc.name,
s.doc.segments[0] –--notice we’re getting the first item in the array of possible individuals
from the table people;
id Name segments[0]
–--- –---- –-----------------------------------
72 George wealthy
79 Lucy married
Textfile and CSV-Specific Access Parameters
Table 18-5 Textfile and CSV-Specific Access Parameters
Textfile-Specific Access Parameter | Description |
---|---|
|
Specifies the size of the input/output (I/O) buffer used for reading the file. The value is the size of the buffer in kilobytes. Note that the buffer size is also the largest size that a record can be. If a format reader encounters a record larger than this value, then it will return an error. Default:
|
|
When set to Valid values: Default: Example:
|
|
Specifies the character set of source files. Valid values: Default: Example:
|
|
If this parameter is specified, then the code tries
to decompress the data according to the compression scheme
specified.
Valid values:
Default: no compression If
|
|
If a row has data type conversion errors,
then the related columns are stored as null, or the row is
rejected. Valid values:
Default:
Example:
|
|
Specifies the character used to indicate the value
of a field is |
|
Specifies the character used to separate the field
values. The character value must be wrapped in single-quotes.
Example: Default:
|
|
Specifies the character used to escape any embedded
field terminators or line terminators in the value for fields.
The character value must be wrapped in single quotes. Example:
|
|
Specifies the date format in the source file. The
format option
Default: Example:
|
|
Specifies the order of fields in the data file. The
values are the same as for |
|
Blank lines are ignored when set to true. Valid values: Default: Example:
|
|
Missing columns are stored as null. Valid values: Default: Example:
|
|
Specifies the quote character for the fields. The quote characters are removed during loading when specified. Valid values: character Default: Null, meaning no quote Example:
|
|
The operation errors out after specified number of rows are rejected. This only applies when rejecting records due to conversion errors. Valid values: number Default: Example: |
com.oracle.bigdata.removequotes |
Removes any quotes that are around any field in the source file. Valid values: Default: Example: |
|
Specifies how many rows should be skipped from the start of the files. Valid values: number Default: Example:
|
|
Specifies the timestamp format in the source file.
The format option
Valid values: Default: Example:
|
|
Specifies the timestamp with local timezone format in
the source file. The format option
Valid values: Default: Example:
|
|
Specifies the timestamp with timezone format in the
source file. The format option
Valid values: Default: Example:
|
|
Specifies how the leading and trailing spaces of the fields are trimmed. Valid values: Default: Example:
|
|
If the data in the file is too long for a field, then this option truncates the value of the field rather than rejecting the row or setting the field to NULL. Valid values: Default: Example:
|
Parent topic: Object Store Access Parameters
18.3.4 GATHER_EXTERNAL_TABLE_STATS
This is the PL/SQL interface for manually gathering statistics on external
tables (ORACLE_HDFS
, ORACLE_HIVE
,
ORACLE_BIGDATA
).
The behavior and parameters are identical to that of
dbms_stats.gather_table_stats
, with the exception
that the owner of the table must be the session user running the procedure,
and the stats gathered using this procedure persist after a restart. This
procedure cannot be used on external tables that are automatically synced
from Hive.
See GATHER_TABLE_STATS Procedure
Syntax
PROCEDURE gather_external_table_stats( tabname varchar2, partname
varchar2 default null, estimate_percent number default
dbms_stats.DEFAULT_ESTIMATE_PERCENT, block_sample boolean
default FALSE, method_opt varchar2 default
dbms_stats.DEFAULT_METHOD_OPT, degree number default
dbms_stats.DEFAULT_DEGREE_VALUE, granularity varchar2
default dbms_stats.DEFAULT_GRANULARITY, cascade boolean
default dbms_stats.DEFAULT_CASCADE, stattab varchar2 default
null, statid varchar2 default null, statown varchar2 default
null, no_invalidate boolean default
dbms_stats.to_no_invalidate_type(dbms_stats.get_param('NO_INVALIDATE')),
stattype varchar2 default 'DATA', force boolean default
FALSE, options varchar2 default dbms_stats.DEFAULT_OPTIONS
)
PROCEDURE gather_external_table_stats(
tabname varchar2,
partname varchar2 default null,
estimate_percent number default dbms_stats.DEFAULT_ESTIMATE_PERCENT,
block_sample boolean default FALSE,
method_opt varchar2 default dbms_stats.DEFAULT_METHOD_OPT,
degree number default dbms_stats.DEFAULT_DEGREE_VALUE,
granularity varchar2 default dbms_stats.DEFAULT_GRANULARITY,
cascade boolean default dbms_stats.DEFAULT_CASCADE,
stattab varchar2 default null,
statid varchar2 default null,
statown varchar2 default null,
no_invalidate boolean default dbms_stats.to_no_invalidate_type(dbms_stats.get_param('NO_INVALIDATE')),
stattype varchar2 default 'DATA',
force boolean default FALSE,
options varchar2 default dbms_stats.DEFAULT_OPTIONS
)
Parent topic: Object Store Access Parameters