5.2 Detailed Semantics of SQL Operations on LOBs
This section explains semantics of SQL operations on LOBs in details.
5.2.1 Return Datatype for SQL Operations on LOBs
The return data type of SQL functions on LOBs is dependent on the input parameters.
The return type of a function or operator that takes a LOB or
VARCHAR2
is the same as the data type of the argument passed to
the function or operator. Functions that take more than one argument, such as
CONCAT
, return a LOB data type if one or more arguments is a
LOB.
Example 5-1 CONCAT
function
returning CLOB
CONCAT(CLOB, VARCHAR2)
returns a CLOB
.
Any LOB instance returned by a SQL function is a temporary LOB instance. LOB
instances in tables (persistent LOBs) are not modified by SQL functions, even when
the function is used in the SELECT
list of a query.
5.2.2 NULL vs EMPTY LOB: Semantic Difference between LOBs and VARCHAR2
For the VARCHAR2
data type, a string of length zero is
indistinguishable from a NULL
value for the column.
LOB
data type, there are three
possible states:
NULL
: This means the column has no LOB locator.- Zero-length value: This can be achieved by inserting an
EMPTY LOB
into the column, or by using an API such asDBMS_LOB.TRIM()
to trim the length to zero. In either case, there is a valid LOB locator in the column, but the LOB value length is zero. - Non-zero length value.
LENGTH
function differs
depending on whether the argument passed is a LOB
or a character
string:
- For a character string of length zero, the
LENGTH
function returnsNULL
. - For a
CLOB
of length zero, or an empty locator such as that returned byEMPTY_CLOB()
, theLENGTH
andDBMS_LOB.GETLENGTH
functions return 0.
IS NULL
and
IS NOT NULL
operators determine whether a LOB locator is stored
in the row:
- When you pass an initialized LOB of length zero to the
IS NULL
function,FALSE
is returned. These semantics are compliant with the SQL 92 standard. - When you pass a
VARCHAR2
of length zero to theIS NULL
function,TRUE
is returned.
5.2.3 WHERE Clause Usage with LOBs
SQL functions with LOBs as arguments, except functions that compare LOB values, are
allowed in predicates of the WHERE
clause.
The LENGTH
function, for example, can be included in the predicate of the WHERE
clause:
CREATE TABLE t (n NUMBER, c CLOB); INSERT INTO t VALUES (1, 'abc'); SELECT * FROM t WHERE c IS NOT NULL; SELECT * FROM t WHERE LENGTH(c) > 0; SELECT * FROM t WHERE c LIKE '%a%'; SELECT * FROM t WHERE SUBSTR(c, 1, 2) LIKE '%b%'; SELECT * FROM t WHERE INSTR(c, 'b') = 2;
5.2.4 CLOBs and NCLOBs Do Not Follow Session Collation Settings
Learn about various operators on CLOB
s and NCLOB
s
and compare the operations on VARCHAR2
and NVARCHAR2
variables with respect to LOBs in this section.
Standard operators that operate on CLOB
s and
NCLOB
s without first converting them to VARCHAR2
or
NVARCHAR2
, are marked as 'Yes' in the SQL or PL/SQL columns of
Table 7-1. These operators do not behave
linguistically, except for REGEXP
functions. Binary comparison of the
character data is performed irrespective of the NLS_COMP
and
NLS_SORT
parameter settings.
These REGEXP
functions are the exceptions, where, if CLOB
or NCLOB
data is passed in, the linguistic comparison is similar to the comparison of VARCHAR2
and NVARCHAR2
values.
-
REGEXP_LIKE
-
REGEXP_REPLACE
-
REGEXP_INSTR
-
REGEXP_SUBSTR
-
REGEXP_COUNT
Note:
CLOBs and NCLOBs support the default USING NLS_COMP option.See Also:
Oracle Database Reference for more information about NLS_COMP
5.2.5 Codepoint Semantics
Codepoint semantics of the INSTR
, SUBSTR
, LENGTH
, and LIKE
functions differ depending on the data type of the argument passed to the function.
These functions use different codepoint semantics depending on whether the argument is a VARCHAR2
or a CLOB
type as follows:
-
When the argument is a
CLOB
, UCS2 codepoint semantics are used for all character sets. -
When the argument is a character type, such as
VARCHAR2
, the default codepoint semantics are used for the given character set:-
UCS2 codepoint semantics are used for AL16UTF16 and UTF8 character sets.
-
UCS4 codepoint semantics are used for all other character sets, such as AL32UTF8.
-
-
If you are storing character data in a
CLOB
orNCLOB
, then note that the amount and offset parameters for any APIs that read or write data to theCLOB
orNCLOB
are specified in UCS2 codepoints. In some character sets, a full character consists one or more UCS2 codepoints called a surrogate pair. In this scenario, you must ensure that the amount or offset you specify does not cut into a full character. This avoids reading or writing a partial character. -
Oracle Database helps to detect half surrogate pair on read or write boundaries in case of SQL functions and in case of read/write through LOB APIs. The behavior is as follows:
-
If the starting offset is in the middle of a surrogate pair, an error is raised for both read and write operations.
-
If the read amount reads only a partial character, increment or decrement the amount by 1 to read complete characters.
Note:
The output amount may vary from the input amount.
-
If the write amount overwrites a partial character, an error is raised to prevent the corruption of existing data caused by overwriting of a partial character in the destination
CLOB
orNCLOB
.Note:
This check only applies to the existing data in the
CLOB
orNCLOB
. You must make sure that the incoming buffer for the write operation starts and ends in complete characters.
-