Random Forest

28.1 About Random Forest

Random Forest is a classification algorithm that builds an ensemble (also called forest) of trees.

The algorithm builds a number of Decision Tree models and predicts using the ensemble. An individual decision tree is built by choosing a random sample from the training data set as the input. At each node of the tree, only a random sample of predictors is chosen for computing the split point. This introduces variation in the data used by the different trees in the forest. The parameters RFOR_SAMPLING_RATIO and RFOR_MTRY are used to specify the sample size and number of predictors chosen at each node. Users can use ODMS_RANDOM_SEED to set the random seed value before running the algorithm.

Related Topics

28.2 Building a Random Forest

The Random Forest is built upon existing infrastructure and Application Programming Interfaces (APIs) of Oracle Machine Learning for SQL.

Random forest models provide attribute importance ranking of predictors. The model is built by specifying parameters in the existing APIs. The scoring is performed using the same SQL queries and APIs as the existing classification algorithms. OML4SQL implements a variant of classical Random Forest algorithm. This implementation supports big data sets. The implementation of the algorithm differs in the following ways:

OML4SQL does not support bagging and instead provides sampling without replacement
Users have the ability to specify the depth of the tree. Trees are not built to maximum depth.

Note:

The term hyperparameter is also interchangeably used for model setting.

Related Topics

DBMS_DATA_MINING — Algorithm Settings: Random Forest

28.3 Scoring with Random Forest

Learn to score with the Random Forest algorithm.

Scoring with Random Forest is the same as any other classification algorithm. The following functions are supported: PREDICTION, PREDICTION_PROBABILITY, PREDICTION_COST, PREDICTION_SET, and PREDICTION_DETAILS.

Related Topics

Oracle Database SQL Language Reference