31 XGBoost
XGBoost is highly-efficient, scalable machine learning algorithm for regression and classification that makes available the XGBoost Gradient Boosting open source package.
31.1 About XGBoost
Oracle Machine Learning for SQL XGBoost prepares training data, invokes XGBoost, builds and persists a model, and applies the model for prediction.
OML4SQL XGBoost is a scalable gradient tree boosting system that supports both classification and regression. It makes available the open source gradient boosting framework.
You can use XGBoost as a stand-alone predictor or incorporate it into real-world production pipelines for a wide range of problems such as ad click-through rate prediction, hazard risk prediction, web text classification, and so on.
The OML4SQL XGBoost algorithm takes three types of parameters: general parameters, booster parameters, and task parameters. You set the parameters through the model settings table. The algorithm supports most of the settings of the open source project.
Through XGBoost, OML4SQL supports a number of different classification and regression specifications, ranking models, and survival models. Binary and multiclass models are supported under the classification machine learning function while regression, ranking, count, and survival are supported under the regression machine learning function.
XGBoost also supports partitioned models and internalizes the data preparation.
31.2 Ranking Methods
Oracle Machine Learning supports pairwise and listwise ranking methods through XGBoost.
For a training data set, in a number of sets, each set consists of objects and labels representing their ranking. A ranking function is constructed by minimizing a certain loss function on the training data. Using test data, the ranking function is applied to get a ranked list of objects. Ranking is enabled for XGBoost using the regression function. OML4SQL supports pairwise and listwise ranking methods through XGBoost.
Pairwise ranking: This approach regards a pair of objects as the learning instance. The pairs and lists are defined by supplying the same case_id
value. Given a pair of objects, this approach gives an optimal ordering for that pair. Pairwise losses are defined by the order of the two objects. In OML4SQL, the algorithm uses LambdaMART to perform pairwise ranking with the goal of minimizing the average number of inversions in ranking.
Listwise ranking: This approach takes multiple lists of ranked objects as learning instance. The items in a list must have the same case_id
. The algorithm uses LambdaMART to perform list-wise ranking.
See Also:
- "Ranking Measures and Loss Functions in Learning to Rank" a research paper presentation at https://www.researchgate.net/
- Oracle Database PL/SQL Packages and Types Reference for a listing and explanation of the available model settings for XGBoost.
Note:
The term hyperparameter is also interchangeably used for model setting.Related Topics
31.3 Scoring with XGBoost
Learn how to score with XGBoost.
The SQL scoring functions supported for a classification XGBoost model are PREDICTION
, PREDICTION_COST
, PREDICTION_DETAILS
, PREDICTION_PROBABILITY
, and PREDICTION_SET
.
The scoring functions supported for a regression XGBoost model are PREDICTION
and PREDICTION_DETAILS
.
The prediction functions return the following information:
PREDICTION
returns the predicted value.PREDICTION_COST
returns a measure of cost for a given prediction as an Oracle NUMBER. (classification only)PREDICTION_DETAILS
returns the SHAP (SHapley Additive exPlanation) contributions.PREDICTION_PROBABILITY
returns the probability for a given prediction. (classification only)PREDICTION_SET
returns the prediction and the corresponding prediction probability for each observation. (classification only)
Related Topics