37.4 Create a Model that Includes Machine Learning Operations on Text
Learn how to create a model that includes machine learning operations on text.
Oracle Machine Learning for SQL supports unstructured text within columns of VARCHAR2
, CHAR
, CLOB
, BLOB
, and BFILE
, as described in the following table:
Table 37-2 Column Data Types That May Contain Unstructured Text
Data Type | Description |
---|---|
|
Oracle Machine Learning for SQL interprets |
|
OML4SQL interprets |
|
OML4SQL interprets |
|
OML4SQL interprets OML4SQL interprets |
The settings described in the following table control the term extraction process for text attributes in a model. Instructions for specifying model settings are in "Specifying Model Settings".
Table 37-3 Model Settings for Text
Setting Name | Data Type | Setting Value | Description |
---|---|---|---|
|
|
Name of an Oracle Text policy object created with |
Affects how individual tokens are extracted from unstructured text. |
|
|
1 <= value <= 100000 |
Maximum number of features to use from the document set (across all documents of each text column) passed to Default is 3000. |
A model can include one or more text attributes. A model with text attributes can also include categorical and numerical attributes.
To create a model that includes text attributes:
-
Create an Oracle Text policy object.
-
Specify the model configuration settings that are described in "Table 37-3".
-
Specify which columns must be treated as text and, optionally, provide text transformation instructions for individual attributes.
-
Pass the model settings and text transformation instructions to
DBMS_DATA_MINING.CREATE_MODEL2
orDBMS_DATA_MINING.CREATE_MODEL
.Note:
All algorithms except O-Cluster can support columns of unstructured text.
The use of unstructured text is not recommended for association rules (Apriori).
In the following example, an SVM model is used to predict customers that are most likely to be positive responders to an Affinity Card loyalty program. The data comes with a text column that contains user generated comments. By creating an Oracle Text policy and specifying model settings, the algorithm automatically uses the text column and builds the model on both the structured data and unstructured text.
This example uses a view called mining_data
which is created from SH.SALES
table. A training data set called mining_train_text
is also created.
The following queries show you how to create an Oracle Text policy followed by building a model using CREATE_MODEL2
procedure.
%script BEGIN EXECUTE ctx_ddl.create_policy('dmdemo_svm_policy'); PL/SQL procedure successfully completed. --------------------------- PL/SQL procedure successfully completed.
%script BEGIN DBMS_DATA_MINING.DROP_MODEL('T_SVM_Clas_sample'); EXCEPTION WHEN OTHERS THEN NULL; END; / DECLARE v_setlst DBMS_DATA_MINING.SETTING_LIST; xformlist dbms_data_mining_transform.TRANSFORM_LIST; BEGIN v_setlst(dbms_data_mining.algo_name) := dbms_data_mining.algo_support_vector_machines; v_setlst(dbms_data_mining.prep_auto) := dbms_data_mining.prep_auto_on; v_setlst(dbms_data_mining.svms_kernel_function) := dbms_data_mining.svms_linear; v_setlst(dbms_data_mining.svms_complexity_factor) := '100'; v_setlst(dbms_data_mining.odms_text_policy_name) := 'DMDEMO_SVM_POLICY'; v_setlst(dbms_data_mining.svms_solver) := dbms_data_mining.svms_solver_sgd; dbms_data_mining_transform.SET_TRANSFORM( xformlist, 'comments', null, 'comments', null, 'TEXT'); DBMS_DATA_MINING.CREATE_MODEL2( model_name => 'T_SVM_Clas_sample', mining_function => dbms_data_mining.classification, data_query => 'select * from mining_train_text', set_list => v_setlst, case_id_column_name => 'cust_id', target_column_name => 'affinity_card', xform_list => xformlist); END; / PL/SQL procedure successfully completed. --------------------------- PL/SQL procedure successfully completed. ---------------------------