37.6 Configure a Text Attribute
Provide transformation instructions for text attribute or unstructured text by explicitly identifying the column datatypes.
As shown in Table 37-2, you can identify columns of CHAR,
shorter VARCHAR2
(<=4000), BFILE
, and BLOB
as text attributes. If CHAR
and shorter VARCHAR2
columns are not explicitly identified as unstructured text, then CREATE_MODEL
processes them as categorical attributes. If BFILE
and BLOB
columns are not explicitly identified as unstructured text, then CREATE_MODEL
returns an error.
To identify a column as a text attribute, supply the keyword TEXT
in an Attribute specification. The attribute specification is a field (attribute_spec
) in a transformation record (transform_rec
). Transformation records are components of transformation lists (xform_list
) that can be passed to CREATE_MODEL
or CREATE_MODEL2
.
Note:
An attribute specification can also include information that is not related to text. Instructions for constructing an attribute specification are in "Embedding Transformations in a Model".
You can provide transformation instructions for any text attribute by qualifying the TEXT
keyword in the attribute specification with the subsettings described in the following table.
Table 37-5 Attribute-Specific Text Transformation Instructions
Subsetting Name | Description | Example |
---|---|---|
|
A sequence of two adjacent elements from a string of tokens, which are typically letters, syllables, or words. Here, |
( |
|
Name of an Oracle Text policy object created with |
( |
|
Here, |
( |
|
Oracle Machine Learning for SQL supports synonyms. The following is an optional parameter: < |
( ( |
|
The following values are supported:
|
|
|
Maximum number of features to use from the attribute. |
|
Note:
The TEXT
keyword is only required for CLOB
and longer VARCHAR2
(>4000) when you specify transformation instructions. The TEXT
keyword is always required for CHAR
, shorter VARCHAR2
, BFILE
, and BLOB
— whether or not you specify transformation instructions.
Tip:
You can view attribute specifications in the data dictionary view ALL_MINING_MODEL_ATTRIBUTES
, as shown in Oracle Database Reference.
Token Types in an Attribute Specification
When stems or themes are specified as the token type, the lexer preference for the text policy must support these types of tokens.
The following example adds themes and English stems to BASIC_LEXER
.
BEGIN CTX_DDL.CREATE_PREFERENCE('my_lexer', 'BASIC_LEXER'); CTX_DDL.SET_ATTRIBUTE('my_lexer', 'index_stems', 'ENGLISH'); CTX_DDL.SET_ATTRIBUTE('my_lexer', 'index_themes', 'YES'); END;
Example 37-1 A Sample Attribute Specification for Text
This expression specifies that text transformation for the attribute must use the text policy named my_policy
. The token type is THEME
, and the maximum number of features is 3000.
"TEXT(POLICY_NAME:my_policy)(TOKEN_TYPE:THEME)(MAX_FEATURES:3000)"