C Collation Derivation and Determination Rules for SQL Operations
This appendix describes collation derivation and determination rules for SQL operations. This appendix contains the following topics:
C.1 Collation Derivation
The process of determining the collation of a character result of an SQL operation is called collation derivation. Such operation may be an operator, column reference, character literal, bind variable reference, function call, CASE expression, or a query clause.
Each character value in an SQL expression has a derived collation and a derived coercibility level.
The derived collation and coercibility level of the basic expressions is described in the following table.
Table C-1 Derived Collation and Derived Coercibility Level of Various Expression Types
Type of Expression | Derived Collation | Derived Coercibility Level |
---|---|---|
Result of the |
The named collation or the pseudo-collation specified in the |
0 |
Data container reference such as table, view, or materialized view column reference |
The declared named collation or the pseudo-collation of the data container |
2 |
Result of a PL/SQL function call or a user-defined operator |
|
2 |
Character literal |
|
4 |
Character bind variable reference when the |
|
4 |
Character bind variable reference when the |
Collation with ID passed as the attribute value |
0 |
Note:
-
Coercibility level 1 corresponds to no collation assigned
-
Coercibility level 3 is reserved for future use
The derived collation and coercibility level of an operation’s result is based on the collations and coercibility levels of the operation's arguments. A derivation-relevant character argument of an operation is an argument used to derive the collation of the operator’s result. An operator may have zero or more derivation-relevant character arguments, and zero or more other character arguments, such as flags or other control information not directly interacting with the derivation-relevant arguments. An argument is considered derivation-relevant, if its value is included in the result, either after some transformation or without undergoing any transformation.
An argument that is a format model, a pattern, a flag string, or a key into a virtual table of system information is not considered a derivation-relevant argument. For example, the built-in function TO_CHAR(arg1,arg2,arg3)
has no derivation-relevant arguments, as the main argument arg1
is not of a character data type. The two character arguments arg2
and arg3
are not derivation-relevant arguments as they only define the format and parameters for the conversion of the main argument arg1
.
The derived collation and coercibility level of the result of an operation without derivation-relevant arguments are the same as when a character literal would have been put in that expression in the place of the operation.
The following are the collation derivation rules for operations that return character values and have derivation-relevant arguments. These rules are applied recursively in an expression tree. These rules are based on the SQL standard version ISO/IEC 9075-2:1999.
The derived collation of a result of an operation with derivation-relevant character arguments arg1, arg2, …, argn is:
-
If at least one argument has the coercibility level 0, then all the arguments with coercibility level 0 must have the same collation, which is the derived collation of the result. The coercibility level of the result is 0. If two arguments with coercibility level 0 have different collations, then an error is reported.
-
Otherwise, if at least one argument has the coercibility level 1, then the expression result has the coercibility level 1 and no collation is assigned to it.
-
Otherwise, if LCL is the numerically lowest coercibility level of the arguments, then:
-
If all the arguments with LCL have the same collation, then this collation is the derived collation of the result, and the coercibility level of the result is LCL.
-
Otherwise, the result of the expression has the coercibility level 1 and no collation is assigned to it.
-
Note:
Set operators have arguments that are expression lists. For set operators, collation derivation is performed separately on corresponding elements of each of the arguments of the expression list. For example, in the query:
SELECT expr1, expr2 FROM t1
UNION
SELECT expr3, expr4 FROM t2
the collation is derived separately for the first and the second column of the result set. For the first column, the collation derivation rules are applied to expr1
and expr3
. For the second column, the rules are applied to expr2
and expr4
.
Collation Derivation for Bind Variable References
In OCI, you can pass a collation for a bind variable in a query or a DML statement using the value of the OCI_ATTR_COLLATION_ID
attribute. The OCI_ATTR_COLLATION_ID
attribute can be set on a bind variable handle to any of the supported collation IDs using the OCIAttrSet()
function. The IDs of both named collations and pseudo-collations are allowed. In this case, the derived coercibility level of the bind variable reference is 0.
When the OCI_ATTR_COLLATION_ID
attribute value is set to OCI_COLLATION_NONE
(the default value) on a bind variable handle, the collation of the bind variable is USING_NLS_COMP
and the derived coercibility level of the bind variable reference is 4.
OCI does not check whether a collation is valid for a given data type of a bind variable. If the OCI_ATTR_COLLATION_ID
attribute value is set for a non-character data type variable, it is ignored by the database server.
Collation of bind variables is currently ignored in PL/SQL expressions. For forward compatibility reasons, the OCI_ATTR_COLLATION_ID
attribute should not be set for bind variables passed to an anonymous PL/SQL block, unless the variables are referenced exclusively in SQL statements.
See Also:
Oracle Call Interface Programmer's Guide for more information about the OCI_ATTR_COLLATION_ID
attribute.
C.2 Collation Determination
Collation determination is the process of selecting the right collation to apply during the execution of a collation-sensitive operation. A collation-sensitive operation can be an SQL operator, condition, built-in function call, CASE expression or a query clause.
For Oracle Database releases earlier to 12.2, collation to be applied by an operation is determined by only the NLS_SORT
and NLS_COMP
session parameters.
Note:
The optional second parameters to NLS_UPPER, NLS_LOWER, NLS_INITCAP,
and NLSSORT
are exceptions.
Starting from Oracle Database 12.2, collation to be applied by an operation is determined by the derived data-bound collations of its arguments. Once a pseudo-collation is determined as the collation to use, NLS_SORT
and NLS_COMP
session parameters are checked to provide the actual named collation to apply.
Note:
The collation determination does not have to apply to the same operation to which collation derivation applies. For example, TO_CHAR
function is not collation-sensitive, so it does not need collation determination. But, TO_CHAR
function returns a character result that needs a collation declaration, hence collation derivation applies to it. Conversely, INSTR
function needs to match characters and needs a collation determined for this match operation. However, the result of INSTR
function is a number, hence no collation derivation is required for it.
The determination-relevant character argument of an operation is an argument used to determine the collation to be used by the operation. A collation-sensitive operation may have one or more determination-relevant character arguments and zero or more other character arguments, such as flags or other control information not directly interacting with the determination-relevant arguments.
An argument is considered determination-relevant, if its value is compared during the evaluation of an operation. An argument that is a format model, a flag string, or a key into a virtual table of system information is not considered a determination-relevant argument. However, a pattern argument can be a determination-relevant argument. For example, two of the three arguments of the LIKE
predicate – argument and pattern – are determination-relevant arguments. The third argument – the escape character – is not considered determination-relevant argument. Another example is the built-in function REGEXP_COUNT
, which has four arguments – source_char, pattern, position, and match_param. The determination-relevant arguments are source_char and pattern, which contain the strings to be compared. The non-determination-relevant character argument are position, which is numeric, and match_param, which provides parameters for the matching operation.
The following are the collation determination rules to determine the collation to use for an operation with determination-relevant character arguments arg1, arg2, …, argn. These rules are based on the SQL standard version ISO/IEC 9075-2:1999.
-
If operation is the equality condition and is used to enforce a foreign key constraint, then the collation to be used is the declared collation of the primary or unique key column being referenced. This declared collation must be the same as the declared collation of the foreign key column.
-
Otherwise, if at least one argument has the derived coercibility level 0, then all the arguments with coercibility level 0 must have the same collation, and this collation is used by the operation. If two arguments with coercibility level 0 have different collations, then an error is reported.
-
Otherwise, if at least one argument has the derived coercibility level 1, then an error is reported.
-
Otherwise, if LCL is the numerically lowest coercibility level of the arguments, then:
-
If all arguments with LCL have the same collation, then that collation is used by the operation.
-
Otherwise, an error is reported.
-
When the determined collation is a pseudo-collation, then the affected operation must refer to the session or database settings NLS_SORT
or NLS_COMP
or both to determine the actual named collation to apply. The database settings are used for expressions in virtual columns, CHECK
constraints, and fine grained auditing (FGA) rules.
The collation determination rules for an operation involving a CLOB
or an NCLOB
data type value must result in the pseudo-collation USING_NLS_COMP
, otherwise an error is reported.
Note:
Some conditions, set operators, and query clauses have arguments which are expression lists. In this case, collation determination is performed on the corresponding compared elements of each of the arguments in the expression list. For example, in the condition:
(expr1, expr2) IN (SELECT expr3, expr4 FROM t1)
the collation is determined separately for the pairs of compared elements. First, the collation determination rules are applied to expr1
and expr3
. Then, the rules are applied to expr2
and expr4
. When the condition is evaluated, values of expr1
are compared to values of expr3
using the first determined collation and values of expr2
are compared to values of expr4
using the second determined collation. Similarly, in the query:
SELECT expr1, expr2 FROM t1
MINUS
SELECT expr3, expr4 FROM t2
the collation determination rules are first applied to expr1
and expr3
, then to expr2
and expr4
. When the MINUS
operator is evaluated, values of expr1
are compared to values of expr3
using the first determined collation and values of expr2
are compared to values of expr4
using the second determined collation.
In the query:
SELECT * FROM t1 ORDER BY expr1, expr2, expr3
rows are sorted first on values of expr1
using the derived collation expr1
, then ties are broken by sorting on values of expr2
using the derived collation expr2
, and then on values of expr3
using the derived collation expr3
. Each position in the ORDER BY
list is treated like a separate comparison operator for row values.
C.3 SQL Operations and Their Derivation- and Determination-relevant Arguments
The following table lists all the SQL operations that return a character value or are collation-sensitive or both. For each operation returning a character value, the table lists operation’s derivation-relevant arguments. If the operation has no such arguments, the fixed collation of the operation's result is shown instead. The term Literal Collation means that the collation derived for the operation's result is the collation of a character literal put in place of the operation in an expression; this is either USING_NLS_COMP
for top-level SQL statements or the default collation of a view, materialized view, or a PL/SQL stored unit containing the expression in its source. For each collation-sensitive operation, the following table lists the operation’s determination-relevant arguments.
Table C-2 Derivation- and Determination-relevant Arguments for SQL Operations
Operation Type | Operation Name | Operation Description | Derivation-relevant Arguments or Fixed Collation | Determination-relevant Arguments |
---|---|---|---|---|
Pseudo-column |
|
Operation type in a flashback version query |
Literal collation |
— |
Pseudo-column |
|
Value of nested table element of character data type |
USING_NLS_COMP |
— |
Operator |
|
Character Value Concatenation |
a1, a2 |
— |
Operator |
|
Hierarchical query parent value |
a1 |
— |
Operator |
|
Hierarchical query root value |
a1 |
— |
Operator |
|
Non-distinct union of two row sets |
Collation for each column of the resulting row set is derived separately by combining collations of columns from each of the two argument row sets. Special case: if an argument |
— |
Operator |
|
Distinct union of two row sets |
Collation for each column of the resulting row set is derived separately by combining collations of columns from each of the two argument row sets. |
Collation for comparison of each column of the argument row set is determined separately by combining collations of columns from each of the two argument row sets. |
Operator |
|
Distinct intersection of two row sets |
Collation for each column of the resulting row set is derived separately by combining collations of columns from each of the two argument row sets. |
Collation for comparison of each column of the argument row set is determined separately by combining collations of columns from each of the two argument row sets. |
Operator |
|
Distinct subtraction of row sets |
Collation for each column of the resulting row set is derived separately by combining collations of columns from each of the two argument row sets. |
Collation for comparison of each column of the argument row set is determined separately by combining collations of columns from each of the two argument row sets. |
Expression |
|
Searched case expression |
|
Each condition |
Expression |
|
Simple case expression; equivalent to:
|
|
If collation of
v does not dominate over collations of: then simple case is transformed to searched case internally. |
Expression |
Object Access Expression |
Reference to an object method |
USING_NLS_COMP |
— |
Expression |
:name |
Bind variable reference |
Literal collation |
— |
Expression |
|
Expression list |
Each list element has its collation derived separately and independently. |
When two lists are compared, the collation determination is performed separately and independently for each of the two character data type elements at the same index in both the lists. |
Condition |
|
Simple comparison conditions |
— |
If |
Condition |
( |
List comparison condition; equivalent to:
|
— |
Collations are determined separately for each pair. If |
Condition |
( |
Query comparison conditions |
— |
a1, a2 If |
Condition |
|
Check if pattern |
— |
|
Condition |
|
Check if regular expression |
— |
|
Condition |
|
Range comparison; equivalent to:
|
— |
Collation is determined separately for each comparison. |
Condition |
|
Membership comparison; equivalent to:
|
— |
See =ANY above
|
Function |
|
Approximate count of distinct values of |
— |
a1 |
Function |
|
Escape non-ASCII characters in |
a1 |
— |
Function |
|
Cast value |
|
— |
Function |
|
Convert numeric code |
Literal collation |
— |
Function |
|
First non-null value among:
|
|
— |
Function |
|
Return name of derived collation of |
Literal collation |
— |
Function |
|
Aggregate into a nested table |
— |
|
Function |
|
Normalize |
a1 |
— |
Function |
|
Concatenate strings |
a1, a2 |
— |
Function |
|
Convert character set of |
a1 |
— |
Function |
|
Count distinct values of |
— |
a1 |
Function |
|
Kendall's tau-b correlation coefficient |
— |
Collation is determined independently for each argument. |
Function |
|
Spearman's rho correlation coefficient |
— |
Collation is determined independently for each argument. |
Function |
|
OLAP cube or hierarchy to relational table |
Literal collation (for each character data type column in the generated table) |
— |
Function |
|
Current dimension value in a model clause |
Collation of the dimension column to which CV() call corresponds, |
— |
Function |
|
Database time zone as string |
Literal collation |
— |
Function |
|
Value selection |
|
|
Function |
|
Unicode normalization (NFD, NFKD); |
a1 |
— |
Function |
|
Dense rank of a value in a group of values |
— |
Ranking is based on collation of the elements in function’s |
Function |
|
Debugging dump of |
Literal collation |
— |
Function |
EMPTY_CLOB |
Empty |
USING_NLS_COMP |
— |
Function |
|
Extract time zone information from the datetime value |
Literal collation |
— |
Function |
|
Extract element value from |
Literal collation |
— |
Function |
|
First value of |
a1 |
— |
Function |
|
Largest value among |
|
|
Function |
|
Capitalize initial letters of |
a1 |
— |
Function |
|
Position of |
— |
|
Function |
|
Retrieve fragment of the JSON object |
Literal collation |
— |
Function |
|
Present fragment of the JSON object |
Literal collation (for each character data type column in the generated table) |
— |
Function |
|
Retrieve a scalar value from the JSON object |
Literal collation |
— |
Function |
|
Value of |
a1 |
— |
Function |
|
Last value of |
a1 |
— |
Function |
|
Value of |
a1 |
— |
Function |
|
Smallest value among |
|
|
Function |
|
Aggregate values of |
|
— |
Function |
|
Lowercase |
a1 |
— |
Function |
|
Pad string |
a1 |
— |
Function |
|
Remove characters from the beginning of |
a1 |
a1 |
Function |
|
Maximum value of |
a1 |
|
Function |
|
Minimum value of |
a1 |
a1 |
Function |
|
Convert numeric code |
Literal collation |
— |
Function |
|
Name of the character set with ID |
Literal collation |
— |
Function |
|
Name of the collation with ID |
Literal collation |
— |
Function |
|
Capitalize initial letters of |
a1 |
Collation specified with
|
Function |
|
Lowercase |
a1 |
Collation specified with
|
Function |
|
Capitalize |
a1 |
Collation specified with
|
Function |
|
Generate collation key for |
— |
Collation specified with |
Function |
|
The n-th value of |
a1 |
— |
Function |
|
This is equivalent to:
|
a1 |
|
Function |
|
|
a1, a2 |
— |
Function |
|
|
a2, a3 |
— |
Function |
|
Invoking user name |
Literal collation |
— |
Function |
|
Path to a resource |
Literal collation |
— |
Function |
|
Percent rank of a value in a group of values |
— |
Ranking is based on collation of the elements in function’s |
Function |
|
Data mining prediction |
Literal collation |
— |
Function |
|
If the cell reference |
a2, a3 |
— |
Function |
|
If the cell reference |
a2, a3 |
— |
Function |
|
Value of the cell reference a1 at the beginning of an iteration in a model clause |
a1 |
— |
Function |
|
Rank of a value in a group of values |
— |
Ranking is based on collation of the elements in function’s |
Function |
|
Convert the RAW value |
Literal collation |
— |
Function |
|
Convert the |
Literal collation |
— |
Function |
|
Number of times regular expression |
— |
|
Function |
|
Minimal position in |
— |
|
Function |
|
Replace with string |
a1 |
|
Function |
|
Return the |
a1 |
|
Function |
|
|
a1 |
|
Function |
|
Convert the rowid |
Literal collation |
— |
Function |
|
Convert the rowid |
Literal collation |
— |
Function |
|
Pad string |
a1 |
— |
Function |
|
Remove characters from the end of |
a1 |
|
Function |
|
Database time zone as string |
Literal collation |
— |
Function |
|
Soundex representation of |
a1 |
— |
Function |
|
Exact probability test of dichotomous variables |
— |
|
Function |
|
Crosstab analysis of |
— |
Collation is determined independently for each argument. |
Function |
|
Variance analysis of |
— |
|
Function |
|
Kolmogorov-Smirnov function |
— |
Collation is determined independently for each argument. |
Function |
|
Most frequent value of |
a1 |
|
Function |
|
Mann Whitney test |
— |
Collation is determined independently for each argument. |
Function |
|
One-way analysis of variance |
— |
|
Function |
|
T-test of independent groups with same variance |
— |
|
Function |
|
T-test of independent groups with unequal variance |
— |
|
Function |
|
Substring of |
a1 |
— |
Function |
|
Path of value |
a1 |
— |
Function |
|
Context parameter |
Literal collation |
— |
Function |
/*character*/ |
Convert |
a1 |
— |
Function |
/*datetime*/ |
Convert |
Literal collation |
— |
Function |
/*number*/ |
Convert |
Literal collation |
— |
Function |
|
Convert |
(must yield |
— |
Function |
/*long*/ |
Convert |
(must yield |
— |
Function |
|
Map normal-width characters in |
a1 |
— |
Function |
/*character*/ |
Convert |
a1 |
— |
Function |
/*datetime*/ |
Convert |
Literal collation |
— |
Function |
/*number*/ |
Convert |
Literal collation |
— |
Function |
|
Convert |
a1 (must yield |
— |
Function |
|
Map full-width characters in |
a1 |
— |
Function |
|
Transform |
a1 |
|
Function |
|
Convert (roughly equivalent to: |
a1 |
— |
Function |
|
Remove all occurrences of character |
a2 |
|
Function |
|
Offset for the time zone |
Literal collation |
— |
Function |
|
Transform string |
a1 |
— |
Function |
|
Capitalize string |
a1 |
— |
Function |
|
Login user name |
Literal collation |
— |
Function |
|
|
Literal collation |
— |
Function |
|
Cast result of |
Literal collation |
— |
Function |
|
Serialize XML document |
Literal collation |
— |
Function |
|
Present content of an XML object as a virtual relational table |
Literal collation (for each character data type column in the generated table) |
— |
Clause |
|
Analytic clause partitioning |
— |
Collation is determined separately for each character argument in the clause. |
Clause |
|
Analytic clause ordering |
— |
Collation is determined separately for each character argument in the clause. |
Clause |
|
Aggregate function ordering |
— |
Collation is determined separately for each character argument in the clause. |
Clause |
|
Query result ordering |
— |
Collation is determined separately for each character argument in the clause. |
Clause |
|
Query row grouping |
— |
Collation is determined separately for each character argument in the clause. |