35.5 The CREATE_MODEL Procedure
Shows the settings in the CREATE_MODEL
procedure.
The CREATE_MODEL
procedure in the DBMS_DATA_MINING
package uses the specified data to create a machine learning model with the specified name and machine learning function. The model can be created with configuration settings and user-specified transformations.
PROCEDURE CREATE_MODEL( model_name IN VARCHAR2, mining_function IN VARCHAR2, data_table_name IN VARCHAR2, case_id_column_name IN VARCHAR2, target_column_name IN VARCHAR2 DEFAULT NULL, settings_table_name IN VARCHAR2 DEFAULT NULL, data_schema_name IN VARCHAR2 DEFAULT NULL, settings_schema_name IN VARCHAR2 DEFAULT NULL, xform_list IN TRANSFORM_LIST DEFAULT NULL);
The following example builds a classification model using the Support Vector Machine algorithm.
Create the settings table CREATE TABLE svm_model_settings ( setting_name VARCHAR2(30), setting_value VARCHAR2(30)); -- Populate the settings table -- Specify SVM. By default, Naive Bayes is used for classification. -- Specify ADP. By default, ADP is not used. BEGIN INSERT INTO svm_model_settings (setting_name, setting_value) VALUES (dbms_data_mining.algo_name, dbms_data_mining.algo_support_vector_machines); INSERT INTO svm_model_settings (setting_name, setting_value) VALUES (dbms_data_mining.prep_auto,dbms_data_mining.prep_auto_on); COMMIT; END; / -- Create the model using the specified settings BEGIN DBMS_DATA_MINING.CREATE_MODEL( model_name => 'svm_model', mining_function => dbms_data_mining.classification, data_table_name => 'mining_data_build_v', case_id_column_name => 'cust_id', target_column_name => 'affinity_card', settings_table_name => 'svm_model_settings'); END; /
Related Topics
35.5.1 Choose the Machine Learning Function
Describes providing an Oracle Machine Learning for SQL machine learning function for the CREATE_MODEL
and CREATE_MODEL2
procedure.
An OML4SQL machine learning function specifies a class of problems that can be modeled and solved. You specify a machine learning with the mining_function
argument of the CREATE_MODEL
and CREATE_MODEL2
procedure.
OML4SQL machine learning functions implement either supervised or unsupervised learning. Supervised learning uses a set of independent attributes to predict the value of a dependent attribute or target. Unsupervised learning does not distinguish between dependent and independent attributes. Supervised functions are predictive. Unsupervised functions are descriptive.
Note:
In OML4SQL terminology, a function is a general type of problem to be solved by a given approach to machine learning. In SQL language terminology, a function is an operation that returns a result.
In OML4SQL documentation, the term function, or machine learning function refers to an OML4SQL machine learning function; the term SQL function or SQL machine learning function refers to a SQL function for scoring (applying machine learning models).
You can specify any of the values in the following table for the mining_function
parameter to the CREATE_MODEL
and CREATE_MODEL2
procedure.
Table 35-7 Oracle Machine Learning mining_function Values
mining_function Value | Description |
---|---|
|
Association is a descriptive machine learning function. An association model identifies relationships and the probability of their occurrence within a data set (association rules). Association models use the Apriori algorithm. |
|
Attribute importance is a predictive machine learning function. An attribute importance model identifies the relative importance of attributes in predicting a given outcome. Attribute importance models use the Minimum Description Length algorithm and CUR Matrix Decomposition. |
|
Classification is a predictive machine learning function. A classification model uses historical data to predict a categorical target. Classification models can use Naive Bayes, Neural Network, Decision Tree, logistic regression, Random Forest, Support Vector Machine, Explicit Semantic Analysis, or XGBoost. The default is Naive Bayes. You can also specify the classification machine learning function for anomaly detection for a One-Class SVM model and a Multivariate State Estimation Technique - Sequential Probability Ratio Test model. |
|
Clustering is a descriptive machine learning function. A clustering model identifies natural groupings within a data set. Clustering models can use k-Means, O-Cluster, or Expectation Maximization. The default is k-Means. |
|
Feature extraction is a descriptive machine learning function. A feature extraction model creates a set of optimized attributes. Feature extraction models can use Non-Negative Matrix Factorization, Singular Value Decomposition (which can also be used for Principal Component Analysis) or Explicit Semantic Analysis. The default is Non-Negative Matrix Factorization. |
|
Regression is a predictive machine learning function. A regression model uses historical data to predict a numerical target. Regression models can use Support Vector Machine, GLM regression, or XGBoost. The default is Support Vector Machine. |
|
Time series is a predictive machine learning function. A time series model forecasts the future values of a time-ordered series of historical numeric data over a user-specified time window. Time series models use the Exponential Smoothing algorithm. The default is Exponential Smoothing. |
Related Topics
35.5.2 Choose the Algorithm
Learn about providing the algorithm settings for a model.
The ALGO_NAME
setting specifies the algorithm for a model. If you use the default algorithm for the machine learning function, or if there is only one algorithm available for the machine learning function, then you do not need to specify the ALGO_NAME
setting.
Table 35-8 Oracle Machine Learning Algorithms
ALGO_NAME Value | Algorithm | Default? | Machine Learning Model Function |
---|---|---|---|
|
Minimum Description Length |
— |
Attribute importance |
|
Apriori |
— |
Association |
|
CUR Matrix Decomposition |
— |
Attribute importance |
|
Decision Tree |
— |
Classification |
|
Expectation Maximization |
— |
Clustering |
|
Explicit Semantic Analysis |
— |
Feature extraction and classification |
|
Exponential Smoothing |
— |
Time series |
ALGO_EXTENSIBLE_LANG |
Language used for an extensible algorithm |
— |
All machine learning functions are supported |
|
— |
Classification and regression |
|
|
k-Means |
yes |
Clustering |
|
Multivariate State Estimation Technique - Sequential Probability Ratio Test |
— |
Anomaly detection (classification with no target) |
|
Naive Bayes |
yes |
Classification |
|
Neural Network |
— |
Classification |
|
Non-Negative Matrix Factorization |
yes |
Feature extraction |
|
O-Cluster |
— |
Clustering |
|
Random Forest |
— |
Classification |
|
Singular Value Decomposition (can also be used for Principal Component Analysis) |
— |
Feature extraction |
|
Support Vector Machine |
yes |
Default regression algorithm; regression, classification, and anomaly detection (classification with no target) |
|
XGBoost |
— |
Classification and regression |
Related Topics
35.5.3 Supply Transformations
Use xform_list
to specify transformations in the model creation procedures.
35.5.3.1 Create a Transformation List
You can create a transformation list using the DBMS_DATA_MINING_TRANSFORM
package.
The following are the ways to create a transformation list:
-
The
STACK
interface inDBMS_DATA_MINING_TRANSFORM
.The
STACK
interface offers a set of pre-defined transformations that you can apply to an attribute or to a group of attributes. For example, you can specify supervised binning for all categorical attributes. -
The
SET_TRANSFORM
procedure inDBMS_DATA_MINING_TRANSFORM
.The
SET_TRANSFORM
procedure applies a specified SQL expression to a specified attribute. For example, the following statement appends a transformation instruction forcountry_id
to a list of transformations calledmy_xforms
. The transformation instruction dividescountry_id
by 10 before algorithmic processing begins. The reverse transformation multipliescountry_id
by 10.dbms_data_mining_transform.SET_TRANSFORM (my_xforms, 'country_id', NULL, 'country_id/10', 'country_id*10');
The reverse transformation is applied in the model details. If
country_id
is the target of a supervised model, the reverse transformation is also applied to the scored target.
35.5.3.2 Transformation List and Automatic Data Preparation
You can provide transformation list and Automatic Data Preparation (ADP) to customize the data transformation.
The transformation list argument to CREATE_MODEL2
and CREATE_MODEL
interacts with the PREP_AUTO
setting, which controls ADP:
-
When ADP is on and you specify a transformation list, your transformations are applied with the automatic transformations and embedded in the model. The transformations that you specify are processed before the automatic transformations.
-
When ADP is off and you specify a transformation list, your transformations are applied and embedded in the model, but no system-generated transformations are performed.
-
When ADP is on and you do not specify a transformation list, the system-generated transformations are applied and embedded in the model.
-
When ADP is off and you do not specify a transformation list, no transformations are embedded in the model; you must separately prepare the data sets you use for building, testing, and scoring the model.