33.5 Oracle Machine Learning for SQL Statistical Functions

Various SQL statistical functions are available in Oracle Database to explore and analyze data.

A variety of scalable statistical functions are accessible through SQL in Oracle Database. These statistical functions are implemented as SQL functions. The SQL statistical functions can be used to compute standard univariate statistics such as MEAN, MAX, MIN, MEDIAN, MODE, and standard deviation on the data. Users can also perform various other statistical functions such as t-test, f-test, aggregate functions, analytic functions, or ANOVA. The functions listed in the following table are available from SQL.

Table 33-5 SQL Statistical Functions Supported by OML4SQL

Function Description
APPROX_COUNT Returns approximate count of an expression
APPROX_SUM Returns approximate sum of an expression
APPROX_RANK Returns approximate value in a group of values
CORR Retuns the coefficient of correlation of a set of number pairs
CORR_S Calculates the Spearman's rho correlation coefficient
CORR_K Calculates the Kendall's tau-b correlation coefficient
COVAR_POP Returns the population covariance of a set of number pairs
COVAR_SAMP Returns the sample covariance of a set of number pairs.
LAG LAG is an analytic function. It provides access to more than one row of a table at the same time without a self join.
LEAD LEAD is an analytic function. It provides access to more than one row of a table at the same time without a self join.
STATS_BINOMIAL_TEST STATS_BINOMIAL_TEST is an exact probability test used for dichotomous variables, where only two possible values exist.
STATS_CROSSTAB STATS_CROSSTAB is a method used to analyze two nominal variables.
STATS_F_TEST STATS_F_TEST tests whether two variances are significantly different.
STATS_KS_TEST STATS_KS_TEST is a Kolmogorov-Smirnov function that compares two samples to test whether they are from the same population or from populations that have the same distribution.
STATS_MODE Takes as its argument a set of values and returns the value that occurs with the greatest frequency
STATS_MW_TEST A Mann Whitney test compares two independent samples to test the null hypothesis that two populations have the same distribution function against the alternative hypothesis that the two distribution functions are different.
STATS_ONE_WAY_ANOVA Tests differences in means (for groups or variables) for statistical significance by comparing two different estimates of variance
STATS_T_TEST_* The t-test measures the significance of a difference of means
STATS_T_TEST_ONE A one-sample t-test
STATS_T_TEST_PAIRED A two-sample, paired t-test (also known as a crossed t-test)
STATS_T_TEST_INDEP and STATS_T_TEST_INDEPU A t-test of two independent groups with the same variance (pooled variances)

A t-test of two independent groups with unequal variance (unpooled variances)

STDDEV returns the sample standard deviation of a set of numbers
STDDEV_POP Computes the population standard deviation and returns the square root of the population variance
STDDEV_SAMP Computes the cumulative sample standard deviation and returns the square root of the sample variance
SUM Returns the sum of values

DBMS_STAT_FUNCS PL/SQL package is also available for users.