29 Random Forest
Learn how to use Random Forest as a classification algorithm.
29.1 About Random Forest
Random Forest is a classification algorithm that builds an ensemble (also called forest) of trees.
The algorithm builds a number of Decision Tree models and predicts using the ensemble. An individual decision tree is built by choosing a random sample from the training data set as the input. At each node of the tree, only a random sample of predictors is chosen for computing the split point. This introduces variation in the data used by the different trees in the forest. The parameters RFOR_SAMPLING_RATIO
and RFOR_MTRY
are used to specify the sample size and number of predictors chosen at each node. Users can use ODMS_RANDOM_SEED
to set the random seed value before running the algorithm.
Related Topics
29.2 Building a Random Forest
The Random Forest is built upon existing infrastructure and Application Programming Interfaces (APIs) of Oracle Machine Learning for SQL.
Random forest models provide attribute importance ranking of predictors. The model is built by specifying parameters in the existing APIs. The scoring is performed using the same SQL queries and APIs as the existing classification algorithms. OML4SQL implements a variant of classical Random Forest algorithm. This implementation supports big data sets. The implementation of the algorithm differs in the following ways:
-
OML4SQL does not support bagging and instead provides sampling without replacement
-
Users have the ability to specify the depth of the tree. Trees are not built to maximum depth.
Note:
The term hyperparameter is also interchangeably used for model setting.Related Topics
29.3 Scoring with Random Forest
Learn to score with the Random Forest algorithm.
Scoring with Random Forest is the same as any other classification algorithm. The following functions are supported: PREDICTION
, PREDICTION_PROBABILITY
, PREDICTION_COST
, PREDICTION_SET
, and PREDICTION_DETAILS
.
Related Topics