5.2 Detailed Semantics of SQL Operations on LOBs

This section explains semantics of SQL operations on LOBs in details.

5.2.1 Return Datatype for SQL Operations on LOBs

The return data type of SQL functions on LOBs is dependent on the input parameters.

The return type of a function or operator that takes a LOB or VARCHAR2 is the same as the data type of the argument passed to the function or operator. Functions that take more than one argument, such as CONCAT, return a LOB data type if one or more arguments is a LOB.

Example 5-1 CONCAT function returning CLOB

CONCAT(CLOB, VARCHAR2) returns a CLOB.

Any LOB instance returned by a SQL function is a temporary LOB instance. LOB instances in tables (persistent LOBs) are not modified by SQL functions, even when the function is used in the SELECT list of a query.

5.2.2 NULL vs EMPTY LOB: Semantic Difference between LOBs and VARCHAR2

For the VARCHAR2 data type, a string of length zero is indistinguishable from a NULL value for the column.

For the column of a LOB data type, there are three possible states:
  1. NULL: This means the column has no LOB locator.
  2. Zero-length value: This can be achieved by inserting an EMPTY LOB into the column, or by using an API such as DBMS_LOB.TRIM() to trim the length to zero. In either case, there is a valid LOB locator in the column, but the LOB value length is zero.
  3. Non-zero length value.
Due to this difference, the LENGTH function differs depending on whether the argument passed is a LOB or a character string:
  • For a character string of length zero, the LENGTH function returns NULL.
  • For a CLOB of length zero, or an empty locator such as that returned by EMPTY_CLOB(), the LENGTH and DBMS_LOB.GETLENGTH functions return 0.
Similarly, when used with LOBs, the IS NULL and IS NOT NULL operators determine whether a LOB locator is stored in the row:
  • When you pass an initialized LOB of length zero to the IS NULL function, FALSE is returned. These semantics are compliant with the SQL 92 standard.
  • When you pass a VARCHAR2 of length zero to the IS NULL function, TRUE is returned.

5.2.3 WHERE Clause Usage with LOBs

SQL functions with LOBs as arguments, except functions that compare LOB values, are allowed in predicates of the WHERE clause.

The LENGTH function, for example, can be included in the predicate of the WHERE clause:

CREATE TABLE t (n NUMBER, c CLOB);
INSERT INTO t VALUES (1, 'abc');

SELECT * FROM t WHERE c IS NOT NULL;
SELECT * FROM t WHERE LENGTH(c) > 0;
SELECT * FROM t WHERE c LIKE '%a%';
SELECT * FROM t WHERE SUBSTR(c, 1, 2) LIKE '%b%';
SELECT * FROM t WHERE INSTR(c, 'b') = 2;

5.2.4 CLOBs and NCLOBs Do Not Follow Session Collation Settings

Learn about various operators on CLOBs and NCLOBs and compare the operations on VARCHAR2 and NVARCHAR2 variables with respect to LOBs in this section.

Standard operators that operate on CLOBs and NCLOBs without first converting them to VARCHAR2 or NVARCHAR2, are marked as 'Yes' in the SQL or PL/SQL columns of Table 7-1. These operators do not behave linguistically, except for REGEXP functions. Binary comparison of the character data is performed irrespective of the NLS_COMP and NLS_SORT parameter settings.

These REGEXP functions are the exceptions, where, if CLOB or NCLOB data is passed in, the linguistic comparison is similar to the comparison of VARCHAR2 and NVARCHAR2 values.

  • REGEXP_LIKE

  • REGEXP_REPLACE

  • REGEXP_INSTR

  • REGEXP_SUBSTR

  • REGEXP_COUNT

Note:

CLOBs and NCLOBs support the default USING NLS_COMP option.

See Also:

Oracle Database Reference for more information about NLS_COMP

5.2.5 Codepoint Semantics

Codepoint semantics of the INSTR, SUBSTR, LENGTH, and LIKE functions differ depending on the data type of the argument passed to the function.

These functions use different codepoint semantics depending on whether the argument is a VARCHAR2 or a CLOB type as follows:

  • When the argument is a CLOB, UCS2 codepoint semantics are used for all character sets.

  • When the argument is a character type, such as VARCHAR2, the default codepoint semantics are used for the given character set:

    • UCS2 codepoint semantics are used for AL16UTF16 and UTF8 character sets.

    • UCS4 codepoint semantics are used for all other character sets, such as AL32UTF8.

  • If you are storing character data in a CLOB or NCLOB, then note that the amount and offset parameters for any APIs that read or write data to the CLOB or NCLOB are specified in UCS2 codepoints. In some character sets, a full character consists one or more UCS2 codepoints called a surrogate pair. In this scenario, you must ensure that the amount or offset you specify does not cut into a full character. This avoids reading or writing a partial character.

  • Oracle Database helps to detect half surrogate pair on read or write boundaries in case of SQL functions and in case of read/write through LOB APIs. The behavior is as follows:
    • If the starting offset is in the middle of a surrogate pair, an error is raised for both read and write operations.

    • If the read amount reads only a partial character, increment or decrement the amount by 1 to read complete characters.

      Note:

      The output amount may vary from the input amount.

    • If the write amount overwrites a partial character, an error is raised to prevent the corruption of existing data caused by overwriting of a partial character in the destination CLOB or NCLOB.

      Note:

      This check only applies to the existing data in the CLOB or NCLOB. You must make sure that the incoming buffer for the write operation starts and ends in complete characters.