37.4 Create a Model that Includes Machine Learning Operations on Text

Learn how to create a model that includes machine learning operations on text.

Oracle Machine Learning for SQL supports unstructured text within columns of VARCHAR2, CHAR, CLOB, BLOB, and BFILE, as described in the following table:

Table 37-2 Column Data Types That May Contain Unstructured Text

Data Type Description

BFILE and BLOB

Oracle Machine Learning for SQL interprets BLOB and BFILE as text only if you identify the columns as text when you create the model. If you do not identify the columns as text, then CREATE_MODEL returns an error.

CLOB

OML4SQL interprets CLOB as text.

CHAR

OML4SQL interprets CHAR as categorical by default. You can identify columns of CHAR as text when you create the model.

VARCHAR2

OML4SQL interprets VARCHAR2 with data length > 4000 as text.

OML4SQL interprets VARCHAR2 with data length <= 4000 as categorical by default. You can identify these columns as text when you create the model.

Note:

Text is not supported in nested columns or as a target in supervised machine learning.

The settings described in the following table control the term extraction process for text attributes in a model. Instructions for specifying model settings are in "Specifying Model Settings".

Table 37-3 Model Settings for Text

Setting Name Data Type Setting Value Description

ODMS_TEXT_POLICY_NAME

VARCHAR2(4000)

Name of an Oracle Text policy object created with CTX_DDL.CREATE_POLICY

Affects how individual tokens are extracted from unstructured text.

ODMS_TEXT_MAX_FEATURES

INTEGER

1 <= value <= 100000

Maximum number of features to use from the document set (across all documents of each text column) passed to CREATE_MODEL.

Default is 3000.

A model can include one or more text attributes. A model with text attributes can also include categorical and numerical attributes.

To create a model that includes text attributes:

  1. Create an Oracle Text policy object.

  2. Specify the model configuration settings that are described in "Table 37-3".

  3. Specify which columns must be treated as text and, optionally, provide text transformation instructions for individual attributes.

  4. Pass the model settings and text transformation instructions to DBMS_DATA_MINING.CREATE_MODEL2 or DBMS_DATA_MINING.CREATE_MODEL.

    Note:

    All algorithms except O-Cluster can support columns of unstructured text.

    The use of unstructured text is not recommended for association rules (Apriori).

In the following example, an SVM model is used to predict customers that are most likely to be positive responders to an Affinity Card loyalty program. The data comes with a text column that contains user generated comments. By creating an Oracle Text policy and specifying model settings, the algorithm automatically uses the text column and builds the model on both the structured data and unstructured text.

This example uses a view called mining_data which is created from SH.SALES table. A training data set called mining_train_text is also created.

The following queries show you how to create an Oracle Text policy followed by building a model using CREATE_MODEL2 procedure.

%script

BEGIN

EXECUTE ctx_ddl.create_policy('dmdemo_svm_policy');

PL/SQL procedure successfully completed.


---------------------------

PL/SQL procedure successfully completed.
%script

BEGIN DBMS_DATA_MINING.DROP_MODEL('T_SVM_Clas_sample');
EXCEPTION WHEN OTHERS THEN NULL; END;
/
DECLARE
    v_setlst DBMS_DATA_MINING.SETTING_LIST;
    xformlist dbms_data_mining_transform.TRANSFORM_LIST;

BEGIN
   
    v_setlst(dbms_data_mining.algo_name) := dbms_data_mining.algo_support_vector_machines;
    v_setlst(dbms_data_mining.prep_auto) :=  dbms_data_mining.prep_auto_on;
    v_setlst(dbms_data_mining.svms_kernel_function) := dbms_data_mining.svms_linear;
    v_setlst(dbms_data_mining.svms_complexity_factor) := '100';
    v_setlst(dbms_data_mining.odms_text_policy_name) := 'DMDEMO_SVM_POLICY';
  
    v_setlst(dbms_data_mining.svms_solver) :=  dbms_data_mining.svms_solver_sgd;
    dbms_data_mining_transform.SET_TRANSFORM(
        xformlist, 'comments', null, 'comments', null, 'TEXT');
    DBMS_DATA_MINING.CREATE_MODEL2(
        model_name          => 'T_SVM_Clas_sample',
        mining_function     => dbms_data_mining.classification,
        data_query          => 'select * from mining_train_text',
        set_list            => v_setlst,
        case_id_column_name => 'cust_id',
        target_column_name  => 'affinity_card',
        xform_list => xformlist);
END;
/ 

PL/SQL procedure successfully completed.


---------------------------

PL/SQL procedure successfully completed.


---------------------------