6 Character Sets and Character Encoding for JSON Data
JSON data always uses the Unicode character set. In this respect, JSON data is simpler to use than XML data. This is an important part of the JSON Data Interchange Format (RFC 8259). For JSON data processed by Oracle Database, any needed character-set conversions are performed automatically.
Oracle Database uses UTF-8 internally when it processes JSON data (parsing, querying). If the data that is input to such processing, or the data that is output from it, must be in a different character set from UTF-8, then character-set conversion is carried out accordingly.
Character-set conversion can affect performance. And in some cases it can be lossy. Conversion of input data to UTF-8 is a lossless operation, but conversion to output can result in information loss in the case of characters that cannot be represented in the output character set.
If your JSON data is stored in the database as Unicode then no character-set conversion is needed for storage or retrieval. This is the case if any of these conditions apply:
-
Your JSON data is stored as
JSON
type orBLOB
instances. -
The database character set is AL32UTF8 (Unicode UTF-8).
-
Your JSON data is stored as
CLOB
instances that have character set AL16UTF16.
Oracle recommends that you store JSON data using JSON
type
and that you use AL32UTF8 as the database character set if at all
possible.
Regardless of the database character set, JSON data that is stored using
data type JSON
or BLOB
never undergoes character-set
conversion for storage or retrieval. JSON data can be stored using data type
BLOB
as AL32UTF8, AL16UTF16, or AL16UTF16LE.
If you transform JSON data using SQL/JSON functions or PL/SQL methods
and you return the result of the transformation using data type BLOB
then the result is encoded as AL32UTF8. This is true even if the input
BLOB
data uses another Unicode encoding.
For example, if you use function json_query
to extract some
JSON data from BLOB
input and return the result as
BLOB
, it is returned using AL32UTF8.
Lossy character-set conversion can occur if application of a SQL/JSON
function or a PL/SQL method specifies a return data type of VARCHAR2
or
CLOB
and the database character set is not AL32UTF8. If input JSON
data was stored in a BLOB
or JSON
type instance then,
even if it is ultimately written again as BLOB
or JSON
type, if some of it was temporarily changed to VARCHAR2
or
CLOB
then the resulting BLOB
data can suffer from
lossy conversion. This can happen, for example, if you use
json_serialize
.
See Also:
-
Unicode.org for information about Unicode
-
ECMA 404 and IETF RFC 8259 for the definition of the JSON Data Interchange Format
-
Oracle Database Migration Assistant for Unicode Guide for information about using different character sets with the database
-
Oracle Database Globalization Support Guide for information about character-set conversion in the database
Parent topic: Store and Manage JSON Data