10 Optimizer Statistics Concepts
Oracle Database optimizer statistics describe details about the database and its objects.
10.1 Introduction to Optimizer Statistics
The optimizer cost model relies on statistics collected about the objects involved in a query, and the database and host where the query runs.
The optimizer uses statistics to get an estimate of the number of rows (and number of bytes) retrieved from a table, partition, or index. The optimizer estimates the cost for the access, determines the cost for possible plans, and then picks the execution plan with the lowest cost.
Optimizer statistics include the following:
-
Table statistics
-
Number of rows
-
Number of blocks
-
Average row length
-
-
Column statistics
-
Number of distinct values (NDV) in a column
-
Number of nulls in a column
-
Data distribution (histogram)
-
Extended statistics
-
-
Index statistics
-
Number of leaf blocks
-
Number of levels
-
Index clustering factor
-
-
System statistics
-
I/O performance and utilization
-
CPU performance and utilization
-
As shown in Figure 10-1, the database stores optimizer statistics for tables, columns, indexes, and the system in the data dictionary. You can access these statistics using data dictionary views.
Note:
The optimizer statistics are different from the performance statistics visible through V$
views.
10.2 About Optimizer Statistics Types
The optimizer collects statistics on different types of database objects and characteristics of the database environment.
10.2.1 Table Statistics
Table statistics contain metadata that the optimizer uses when developing an execution plan.
10.2.1.1 Permanent Table Statistics
In Oracle Database, table statistics include information about rows and blocks.
The optimizer uses these statistics to determine the cost of table scans and table joins. The database tracks all relevant statistics about permanent tables. For example, table statistics stored in DBA_TAB_STATISTICS
track the following:
-
Number of rows
The database uses the row count stored in
DBA_TAB_STATISTICS
when determining cardinality. - Average row length
-
Number of data blocks
The optimizer uses the number of data blocks with the
DB_FILE_MULTIBLOCK_READ_COUNT
initialization parameter to determine the base table access cost. -
Number of empty data blocks
DBMS_STATS.GATHER_TABLE_STATS
commits before gathering statistics on permanent tables.
Example 10-1 Table Statistics
This example queries table statistics for the sh.customers
table.
SELECT NUM_ROWS, AVG_ROW_LEN, BLOCKS,
EMPTY_BLOCKS, LAST_ANALYZED
FROM DBA_TAB_STATISTICS
WHERE OWNER='SH'
AND TABLE_NAME='CUSTOMERS';
Sample output appears as follows:
NUM_ROWS AVG_ROW_LEN BLOCKS EMPTY_BLOCKS LAST_ANAL
---------- ----------- ---------- ------------ ---------
55500 189 1517 0 25-MAY-17
See Also:
-
Oracle Database Reference for a description of the
DBA_TAB_STATISTICS
view and theDB_FILE_MULTIBLOCK_READ_COUNT
initialization parameter
10.2.1.2 Temporary Table Statistics
DBMS_STATS
can gather statistics for both permanent and global temporary tables, but additional considerations apply to the latter.
10.2.1.2.1 Types of Temporary Tables
Temporary tables are classified as global, private, or cursor-duration.
In all types of temporary table, the data is only visible to the session that inserts it. The tables differ as follows:
-
A global temporary table is an explicitly created persistent object that stores intermediate session-private data for a specific duration.
The table is global because the definition is visible to all sessions. The
ON COMMIT
clause ofCREATE GLOBAL TEMPORARY TABLE
indicates whether the table is transaction-specific (DELETE ROWS
) or session-specific (PRESERVE ROWS
). Optimizer statistics for global temporary tables can be shared or session-specific. -
A private temporary table is an explicitly created object, defined by private memory-only metadata, that stores intermediate session-private data for a specific duration.
The table is private because the definition is visible only to the session that created the table. The
ON COMMIT
clause ofCREATE PRIVATE TEMPORARY TABLE
indicates whether the table is transaction-specific (DROP DEFINITION
) or session-specific (PRESERVE DEFINITION
). -
A cursor-duration temporary table is an implicitly created memory-only object that is associated with a cursor.
Unlike global and private temporary tables,
DBMS_STATS
cannot gather statistics for cursor-duration temporary tables.
The tables differ in where they store data, how they are created and dropped, and in the duration and visibility of metadata. Note that the database allocates storage space when a session first inserts data into a global temporary table, not at table creation.
Table 10-1 Important Characteristics of Temporary Tables
Characteristic | Global Temporary Table | Private Temporary Table | Cursor-Duration Temporary Table |
---|---|---|---|
Visibility of Data | Session inserting data | Session inserting data | Session inserting data |
Storage of Data | Persistent | Memory or tempfiles, but only for the duration of the session or transaction | Only in memory |
Visibility of Metadata | All sessions | Session that created table (in USER_PRIVATE_TEMP_TABLES view, which is based on a V$ view)
|
Session executing cursor |
Duration of Metadata | Until table is explicitly dropped | Until table is explicitly dropped, or end of session (PRESERVE DEFINITION ) or transaction (DROP DEFINITION )
|
Until cursor ages out of shared pool |
Creation of Table | CREATE GLOBAL TEMPORARY TABLE (supports AS SELECT )
|
CREATE PRIVATE TEMPORARY TABLE (supports AS SELECT )
|
Implicitly created when optimizer considers it useful |
Effect of Creation on Existing Transactions | No implicit commit | No implicit commit | No implicit commit |
Naming Rules | Same as for permanent tables | Must begin with ORA$PTT_ |
Internally generated unique name |
Dropping of Table | DROP GLOBAL TEMPORARY TABLE |
DROP PRIVATE TEMPORARY TABLE , or implicitly dropped at end of session (PRESERVE DEFINITION ) or transaction (DROP DEFINITION )
|
Implicitly dropped at end of session |
See Also:
-
Oracle Database Administrator’s Guide to learn how to manage temporary tables
10.2.1.2.2 Statistics for Global Temporary Tables
DBMS_STATS
collects the same types of statistics for global temporary tables as for permanent tables.
Note:
You cannot collect statistics for private temporary tables.
The following table shows how global temporary tables differ in how they gather and store optimizer statistics, depending on whether the tables are scoped to a transaction or session.
Table 10-2 Optimizer Statistics for Global Temporary Tables
Characteristic | Transaction-Specific | Session-Specific |
---|---|---|
Effect of DBMS_STATS collection
|
Does not commit | Commits |
Storage of statistics | Memory only | Dictionary tables |
Histogram creation | Not supported | Supported |
The following procedures do not commit for transaction-specific temporary tables, so that rows in these tables are not deleted:
-
GATHER_TABLE_STATS
-
DELETE_obj_STATS
, whereobj
isTABLE
,COLUMN
, orINDEX
-
SET_obj_STATS
, whereobj
isTABLE
,COLUMN
, orINDEX
-
GET_obj_STATS
, whereobj
isTABLE
,COLUMN
, orINDEX
The preceding program units observe the GLOBAL_TEMP_TABLE_STATS
statistics preference. For example, if the table preference is set to SESSION
, then SET_TABLE_STATS
sets the session statistics, and GATHER_TABLE_STATS
preserves all rows in a transaction-specific temporary table. If the table preference is set to SHARED
, however, then SET_TABLE_STATS
sets the shared statistics, and GATHER_TABLE_STATS
deletes all rows from a transaction-specific temporary table.
See Also:
-
Oracle Database Concepts to learn about global temporary tables
-
Oracle Database PL/SQL Packages and Types Reference to learn about the
DBMS_STATS.GATHER_TABLE_STATS
procedure
10.2.1.2.3 Shared and Session-Specific Statistics for Global Temporary Tables
Starting in Oracle Database 12c, you can set the table-level preference GLOBAL_TEMP_TABLE_STATS
to make statistics on a global temporary table shared (SHARED
) or session-specific (SESSION
).
When GLOBAL_TEMP_TABLE_STATS
is SESSION
, you can gather optimizer statistics for a global temporary table in one session, and then use the statistics for this session only. Meanwhile, users can continue to maintain a shared version of the statistics. During optimization, the optimizer first checks whether a global temporary table has session-specific statistics. If yes, then the optimizer uses them. Otherwise, the optimizer uses shared statistics if they exist.
Note:
In releases before Oracle Database 12c, the database did not maintain optimizer statistics for global temporary tables and non-global temporary tables differently. The database maintained one version of the statistics shared by all sessions, even though data in different sessions could differ.
Session-specific optimizer statistics have the following characteristics:
-
Dictionary views that track statistics show both the shared statistics and the session-specific statistics in the current session.
The views are
DBA_TAB_STATISTICS
,DBA_IND_STATISTICS
,DBA_TAB_HISTOGRAMS
, andDBA_TAB_COL_STATISTICS
(each view has a correspondingUSER_
andALL_
version). TheSCOPE
column shows whether statistics are session-specific or shared. Session-specific statistics must be stored in the data dictionary so that multiple processes can access them in Oracle RAC. -
CREATE ... AS SELECT
automatically gathers optimizer statistics. WhenGLOBAL_TEMP_TABLE_STATS
is set toSHARED
, however, you must gather statistics manually usingDBMS_STATS
. -
Pending statistics are not supported.
-
Other sessions do not share a cursor that uses the session-specific statistics.
Different sessions can share a cursor that uses shared statistics, as in releases earlier than Oracle Database 12c. The same session can share a cursor that uses session-specific statistics.
-
By default,
GATHER_TABLE_STATS
for the temporary table immediately invalidates previous cursors compiled in the same session. However, this procedure does not invalidate cursors compiled in other sessions.
See Also:
-
Oracle Database PL/SQL Packages and Types Reference to learn about the
GLOBAL_TEMP_TABLE_STATS
preference -
Oracle Database Reference for a description of the
DBA_TAB_STATISTICS
view
10.2.2 Column Statistics
Column statistics track information about column values and data distribution.
The optimizer uses column statistics to generate accurate cardinality estimates and make better decisions about index usage, join orders, join methods, and so on. For example, statistics in DBA_TAB_COL_STATISTICS
track the following:
-
Number of distinct values
-
Number of nulls
-
High and low values
-
Histogram-related information
The optimizer can use extended statistics, which are a special type of column statistics. These statistics are useful for informing the optimizer of logical relationships among columns.
See Also:
-
Oracle Database Reference for a description of the
DBA_TAB_COL_STATISTICS
view
10.2.3 Index Statistics
The index statistics include information about the number of index levels, the number of index blocks, and the relationship between the index and the data blocks. The optimizer uses these statistics to determine the cost of index scans.
10.2.3.1 Types of Index Statistics
The DBA_IND_STATISTICS
view tracks index statistics.
Statistics include the following:
-
Levels
The
BLEVEL
column shows the number of blocks required to go from the root block to a leaf block. A B-tree index has two types of blocks: branch blocks for searching and leaf blocks that store values. See Oracle Database Concepts for a conceptual overview of B-tree indexes. -
Distinct keys
This columns tracks the number of distinct indexed values. If a unique constraint is defined, and if no
NOT NULL
constraint is defined, then this value equals the number of non-null values. -
Average number of leaf blocks for each distinct indexed key
-
Average number of data blocks pointed to by each distinct indexed key
See Also:
Oracle Database Reference for a description of the DBA_IND_STATISTICS
view
Example 10-2 Index Statistics
This example queries some index statistics for the cust_lname_ix
and customers_pk
indexes on the sh.customers
table (sample output included):
SELECT INDEX_NAME, BLEVEL, LEAF_BLOCKS AS "LEAFBLK", DISTINCT_KEYS AS "DIST_KEY",
AVG_LEAF_BLOCKS_PER_KEY AS "LEAFBLK_PER_KEY",
AVG_DATA_BLOCKS_PER_KEY AS "DATABLK_PER_KEY"
FROM DBA_IND_STATISTICS
WHERE OWNER = 'SH'
AND INDEX_NAME IN ('CUST_LNAME_IX','CUSTOMERS_PK');
INDEX_NAME BLEVEL LEAFBLK DIST_KEY LEAFBLK_PER_KEY DATABLK_PER_KEY
-------------- ------ ------- -------- --------------- ---------------
CUSTOMERS_PK 1 115 55500 1 1
CUST_LNAME_IX 1 141 908 1 10
10.2.3.2 Index Clustering Factor
For a B-tree index, the index clustering factor measures the physical grouping of rows in relation to an index value, such as last name.
The index clustering factor helps the optimizer decide whether an index scan or full table scan is more efficient for certain queries). A low clustering factor indicates an efficient index scan.
A clustering factor that is close to the number of blocks in a table indicates that the rows are physically ordered in the table blocks by the index key. If the database performs a full table scan, then the database tends to retrieve the rows as they are stored on disk sorted by the index key. A clustering factor that is close to the number of rows indicates that the rows are scattered randomly across the database blocks in relation to the index key. If the database performs a full table scan, then the database would not retrieve rows in any sorted order by this index key.
The clustering factor is a property of a specific index, not a table. If multiple indexes exist on a table, then the clustering factor for one index might be small while the factor for another index is large. An attempt to reorganize the table to improve the clustering factor for one index may degrade the clustering factor of the other index.
Example 10-3 Index Clustering Factor
This example shows how the optimizer uses the index clustering factor to determine whether using an index is more effective than a full table scan.
-
Start SQL*Plus and connect to a database as
sh
, and then query the number of rows and blocks in thesh.customers
table (sample output included):SELECT table_name, num_rows, blocks FROM user_tables WHERE table_name='CUSTOMERS'; TABLE_NAME NUM_ROWS BLOCKS ------------------------------ ---------- ---------- CUSTOMERS 55500 1486
-
Create an index on the
customers.cust_last_name
column.For example, execute the following statement:
CREATE INDEX CUSTOMERS_LAST_NAME_IDX ON customers(cust_last_name);
-
Query the index clustering factor of the newly created index.
The following query shows that the
customers_last_name_idx
index has a high clustering factor because the clustering factor is significantly more than the number of blocks in the table:SELECT index_name, blevel, leaf_blocks, clustering_factor FROM user_indexes WHERE table_name='CUSTOMERS' AND index_name= 'CUSTOMERS_LAST_NAME_IDX'; INDEX_NAME BLEVEL LEAF_BLOCKS CLUSTERING_FACTOR ------------------------------ ---------- ----------- ----------------- CUSTOMERS_LAST_NAME_IDX 1 141 9859
-
Create a new copy of the
customers
table, with rows ordered bycust_last_name
.For example, execute the following statements:
DROP TABLE customers3 PURGE; CREATE TABLE customers3 AS SELECT * FROM customers ORDER BY cust_last_name;
-
Gather statistics on the
customers3
table.For example, execute the
GATHER_TABLE_STATS
procedure as follows:EXEC DBMS_STATS.GATHER_TABLE_STATS(null,'CUSTOMERS3');
-
Query the number of rows and blocks in the
customers3
table .For example, enter the following query (sample output included):
SELECT TABLE_NAME, NUM_ROWS, BLOCKS FROM USER_TABLES WHERE TABLE_NAME='CUSTOMERS3'; TABLE_NAME NUM_ROWS BLOCKS ------------------------------ ---------- ---------- CUSTOMERS3 55500 1485
-
Create an index on the
cust_last_name
column ofcustomers3
.For example, execute the following statement:
CREATE INDEX CUSTOMERS3_LAST_NAME_IDX ON customers3(cust_last_name);
-
Query the index clustering factor of the
customers3_last_name_idx
index.The following query shows that the
customers3_last_name_idx
index has a lower clustering factor:SELECT INDEX_NAME, BLEVEL, LEAF_BLOCKS, CLUSTERING_FACTOR FROM USER_INDEXES WHERE TABLE_NAME = 'CUSTOMERS3' AND INDEX_NAME = 'CUSTOMERS3_LAST_NAME_IDX'; INDEX_NAME BLEVEL LEAF_BLOCKS CLUSTERING_FACTOR ------------------------------ ---------- ----------- ----------------- CUSTOMERS3_LAST_NAME_IDX 1 141 1455
The table
customers3
has the same data as the originalcustomers
table, but the index oncustomers3
has a much lower clustering factor because the data in the table is ordered by thecust_last_name
. The clustering factor is now about 10 times the number of blocks instead of 70 times. -
Query the
customers
table.For example, execute the following query (sample output included):
SELECT cust_first_name, cust_last_name FROM customers WHERE cust_last_name BETWEEN 'Puleo' AND 'Quinn'; CUST_FIRST_NAME CUST_LAST_NAME -------------------- ---------------------------------------- Vida Puleo Harriett Quinlan Madeleine Quinn Caresse Puleo
-
Display the cursor for the query.
For example, execute the following query (partial sample output included):
SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR()); ------------------------------------------------------------------------------- | Id | Operation | Name | Rows |Bytes|Cost (%CPU)| Time | ------------------------------------------------------------------------------- | 0| SELECT STATEMENT | | | | 405 (100)| | |* 1| TABLE ACCESS STORAGE FULL| CUSTOMERS | 2335|35025| 405 (1)|00:00:01| -------------------------------------------------------------------------------
The preceding plan shows that the optimizer did not use the index on the original
customers
tables. -
Query the
customers3
table.For example, execute the following query (sample output included):
SELECT cust_first_name, cust_last_name FROM customers3 WHERE cust_last_name BETWEEN 'Puleo' AND 'Quinn'; CUST_FIRST_NAME CUST_LAST_NAME -------------------- ---------------------------------------- Vida Puleo Harriett Quinlan Madeleine Quinn Caresse Puleo
-
Display the cursor for the query.
For example, execute the following query (partial sample output included):
SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR()); --------------------------------------------------------------------------------------- |Id| Operation | Name |Rows|Bytes|Cost(%CPU)| Time| --------------------------------------------------------------------------------------- | 0| SELECT STATEMENT | | | |69(100)| | | 1| TABLE ACCESS BY INDEX ROWID|CUSTOMERS3 |2335|35025|69(0) |00:00:01| |*2| INDEX RANGE SCAN |CUSTOMERS3_LAST_NAME_IDX|2335| |7(0) |00:00:01| ---------------------------------------------------------------------------------------
The result set is the same, but the optimizer chooses the index. The plan cost is much less than the cost of the plan used on the original
customers
table. -
Query
customers
with a hint that forces the optimizer to use the index.For example, execute the following query (partial sample output included):
SELECT /*+ index (Customers CUSTOMERS_LAST_NAME_IDX) */ cust_first_name, cust_last_name FROM customers WHERE cust_last_name BETWEEN 'Puleo' and 'Quinn'; CUST_FIRST_NAME CUST_LAST_NAME -------------------- ---------------------------------------- Vida Puleo Caresse Puleo Harriett Quinlan Madeleine Quinn
-
Display the cursor for the query.
For example, execute the following query (partial sample output included):
SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR()); --------------------------------------------------------------------------------------- | Id | Operation | Name |Rows|Bytes|Cost(%CPU)|Time | --------------------------------------------------------------------------------------- | 0| SELECT STATEMENT | | | |422(100)| | | 1| TABLE ACCESS BY INDEX ROWID|CUSTOMERS |335 |35025|422(0) |00:00:01| |*2| INDEX RANGE SCAN |CUSTOMERS_LAST_NAME_IDX|2335| |7(0) |00:00:01| ---------------------------------------------------------------------------------------
The preceding plan shows that the cost of using the index on
customers
is higher than the cost of a full table scan. Thus, using an index does not necessarily improve performance. The index clustering factor is a measure of whether an index scan is more effective than a full table scan.
10.2.3.3 Effect of Index Clustering Factor on Cost: Example
This example illustrates how the index clustering factor can influence the cost of table access.
Consider the following scenario:
-
A table contains 9 rows that are stored in 3 data blocks.
-
The
col1
column currently stores the valuesA
,B
, andC
. -
A nonunique index named
col1_idx
exists oncol1
for this table.
Example 10-4 Collocated Data
Assume that the rows are stored in the data blocks as follows:
Block 1 Block 2 Block 3
------- ------- -------
A A A B B B C C C
In this example, the index clustering factor for col1_idx
is low. The rows that have the same indexed column values for col1
are in the same data blocks in the table. Thus, the cost of using an index range scan to return all rows with value A
is low because only one block in the table must be read.
Example 10-5 Scattered Data
Assume that the same rows are scattered across the data blocks as follows:
Block 1 Block 2 Block 3
------- ------- -------
A B C A C B B A C
In this example, the index clustering factor for col1_idx
is higher. The database must read all three blocks in the table to retrieve all rows with the value A
in col1
.
See Also:
Oracle Database Reference for a description of the DBA_INDEXES
view
10.2.4 System Statistics
The system statistics describe hardware characteristics such as I/O and CPU performance and utilization.
System statistics enable the query optimizer to more accurately estimate I/O and CPU costs when choosing execution plans. The database does not invalidate previously parsed SQL statements when updating system statistics. The database parses all new SQL statements using new statistics.
10.2.5 User-Defined Optimizer Statistics
The extensible optimizer enables authors of user-defined functions and indexes to create statistics collection, selectivity, and cost functions.
The optimizer cost model is extended to integrate information supplied by the user to assess CPU and the I/O cost. Statistics types act as interfaces for user-defined functions that influence the choice of execution plan. However, to use a statistics type, the optimizer requires a mechanism to bind the type to a database object such as a column, standalone function, object type, index, indextype, or package. The SQL statement ASSOCIATE STATISTICS
allows this binding to occur.
Functions for user-defined statistics are relevant for columns that use both standard SQL data types and object types, and for domain indexes. When you associate a statistics type with a column or domain index, the database calls the statistics collection method in the statistics type whenever DBMS_STATS
gathers statistics.
See Also:
10.3 How the Database Gathers Optimizer Statistics
Oracle Database provides several mechanisms to gather statistics.
10.3.1 DBMS_STATS Package
The DBMS_STATS
PL/SQL package collects and manages optimizer statistics.
This package enables you to control what and how statistics are collected, including the degree of parallelism, sampling methods, and granularity of statistics collection in partitioned tables.
Note:
Do not use the COMPUTE
and ESTIMATE
clauses of the ANALYZE
statement to collect optimizer statistics. These clauses have been deprecated. Instead, use DBMS_STATS
.
Statistics gathered with the DBMS_STATS
package are required for the creation of accurate execution plans. For example, table statistics gathered by DBMS_STATS
include the number of rows, number of blocks, and average row length.
By default, Oracle Database uses automatic optimizer statistics collection. In this case, the database automatically runs DBMS_STATS
to collect optimizer statistics for all schema objects for which statistics are missing or stale. The process eliminates many manual tasks associated with managing the optimizer, and significantly reduces the risks of generating suboptimal execution plans because of missing or stale statistics. You can also update and manage optimizer statistics by manually executing DBMS_STATS
.
Oracle Database 19c introduces high-frequency automatic optimizer statistics collection. This lightweight task periodically gathers statistics for stale objects. The default interval is 15 minutes. In contrast to the automated statistics collection job, the high-frequency task does not perform actions such as purging statistics for non-existent objects or invoking Optimizer Statistics Advisor. You can set preferences for the high-frequency task using the DBMS_STATS.SET_GLOBAL_PREFS
procedure, and view metadata using DBA_AUTO_STAT_EXECUTIONS
.
See Also:
-
Oracle Database Administrator’s Guide to learn more about automated maintenance tasks
-
Oracle Database PL/SQL Packages and Types Reference to learn about
DBMS_STATS
10.3.2 Supplemental Dynamic Statistics
By default, when optimizer statistics are missing, stale, or insufficient, the database automatically gathers dynamic statistics during a parse. The database uses recursive SQL to scan a small random sample of table blocks.
Note:
Dynamic statistics augment statistics rather than providing an alternative to them.
Dynamic statistics supplement optimizer statistics such as table and index block counts, table and join cardinalities (estimated number of rows), join column statistics, and GROUP BY
statistics. This information helps the optimizer improve plans by making better estimates for predicate cardinality.
Dynamic statistics are beneficial in the following situations:
-
An execution plan is suboptimal because of complex predicates.
-
The sampling time is a small fraction of total execution time for the query.
-
The query executes many times so that the sampling time is amortized.
10.3.3 Online Statistics Gathering
In some circumstances, DDL and DML operations automatically trigger online statistics gathering.
10.3.3.1 Online Statistics Gathering for Bulk Loads
The database can gather table statistics automatically during the following types of bulk loads: INSERT INTO ... SELECT
using a direct path insert, and CREATE TABLE AS SELECT
.
By default, a parallel insert uses a direct path insert. You can force a direct path insert by using the /*+APPEND*/
hint.
See Also:
Oracle Database Data Warehousing Guide to learn more about bulk loads
10.3.3.1.1 Purpose of Online Statistics Gathering for Bulk Loads
Data warehouse applications typically load large amounts of data into the database. For example, a sales data warehouse might load data every day, week, or month.
In releases earlier than Oracle Database 12c, the best practice was to gather statistics manually after a bulk load. However, many applications did not gather statistics after the load because of negligence or because they waited for the maintenance window to initiate collection. Missing statistics are the leading cause of suboptimal execution plans.
Automatic statistics gathering during bulk loads has the following benefits:
-
Improved performance
Gathering statistics during the load avoids an additional table scan to gather table statistics.
-
Improved manageability
No user intervention is required to gather statistics after a bulk load.
10.3.3.1.2 Global Statistics During Inserts into Partitioned Tables
When inserting rows into a partitioned table, the database gathers global statistics during the insert.
For example, if sales
is a partitioned table, and if you run INSERT INTO sales SELECT
, then the database gathers global statistics. However, the database does not gather partition-level statistics.
Assume a different case in which you use partition-extended syntax to insert rows into a specific partition or subpartition. The database gathers statistics on the partition during the insert. However, the database does not gather global statistics.
Assume that you run INSERT INTO sales PARTITION (sales_q4_2000) SELECT
. The database gathers statistics during the insert. If the INCREMENTAL
preference is enabled for sales
, then the database also gathers a synopsis for sales_q4_2000
. Statistics are immediately available after the insert. However, if you roll back the transaction, then the database automatically deletes statistics gathered during the bulk load.
See Also:
-
Oracle Database SQL Language Reference for
INSERT
syntax and semantics
10.3.3.1.3 Histogram Creation After Bulk Loads
After gathering online statistics, the database does not automatically create histograms.
If histograms are required, then after the bulk load Oracle recommends running DBMS_STATS.GATHER_TABLE_STATS
with options=>GATHER AUTO
. For example, the following program gathers statistics for the myt
table:
EXEC DBMS_STATS.GATHER_TABLE_STATS(user, 'MYT', options=>'GATHER AUTO');
The preceding PL/SQL program only gathers missing or stale statistics. The database does not gather table and basic column statistics collected during the bulk load.
Note:
You can set the table preference options
to GATHER AUTO
on the tables that you plan to bulk load. In this way, you need not explicitly set the options
parameter when running GATHER_TABLE_STATS
.
See Also:
-
Oracle Database Data Warehousing Guide to learn more about bulk loads
10.3.3.1.4 Restrictions for Online Statistics Gathering for Bulk Loads
In certain cases, bulk loads do not automatically gather optimizer statistics.
Specifically, bulk loads do not gather statistics automatically when any of the following conditions applies to the target table, partition, or subpartition:
-
The object contains data. Bulk loads only gather online statistics automatically when the object is empty.
-
It is in an Oracle-owned schema such as
SYS
. -
It is one of the following types of tables: nested table, index-organized table (IOT), external table, or global temporary table defined as
ON COMMIT DELETE ROWS
.Note:
The database does gather online statistics automatically for the internal partitions of a hybrid partitioned table.
-
It has a
PUBLISH
preference set toFALSE
. -
Its statistics are locked.
-
It is loaded using a multitable
INSERT
statement.
See Also:
-
Oracle Database PL/SQL Packages and Types Reference to learn more about
DBMS_STATS.SET_TABLE_PREFS
10.3.3.1.5 User Interface for Online Statistics Gathering for Bulk Loads
By default, the database gathers statistics during bulk loads.
You can enable the feature at the statement level by using the GATHER_OPTIMIZER_STATISTICS
hint. You can disable the feature at the statement level by using the NO_GATHER_OPTIMIZER_STATISTICS
hint. For example, the following statement disables online statistics gathering for bulk loads:
CREATE TABLE employees2 AS
SELECT /*+NO_GATHER_OPTIMIZER_STATISTICS*/ * FROM employees
See Also:
Oracle Database SQL Language Reference to learn about the GATHER_OPTIMIZER_STATISTICS
and NO_GATHER_OPTIMIZER_STATISTICS
hints
10.3.3.2 Online Statistics Gathering for Partition Maintenance Operations
Oracle Database provides analogous support for online statistics during specific partition maintenance operations.
For MOVE
, COALESCE
, and MERGE
, the database maintains global and partition-level statistics as follows:
-
If the partition uses either incremental or non-incremental statistics, then the database makes a direct update to the
BLOCKS
value in the global table statistics. Note that this update is not a statistics gathering operation. -
The database generates fresh statistics for the resulting partition. If incremental statistics are enabled, then the database maintains partition synopses.
For TRUNCATE
or DROP PARTITION
, the database updates the BLOCKS
and NUM_ROWS
values in the global table statistics. The update does not require a gathering statistics operation. The statistics update occurs when either incremental or non-incremental statistics are used.
Note:
The database does not maintain partition-level statistics for maintenance operations that have multiple destination segments.
See Also:
Oracle Database VLDB and Partitioning Guide to learn more about partition maintenance operations
10.3.3.3 Real-Time Statistics
Oracle Database can automatically gather real-time statistics during conventional DML operations.
See Also:
Oracle Database Licensing Information User Manual for details on which features are supported for different editions and services
10.3.3.3.1 Purpose of Real-Time Statistics
Online statistics, whether for bulk loads or conventional DML, aim to reduce the possibility of the optimizer being misled by stale statistics.
Oracle Database 12c introduced online statistics gathering for CREATE TABLE AS SELECT
statements and direct-path inserts. Oracle Database 19c introduces real-time statistics, which extend online support to conventional DML statements. Because statistics can go stale between DBMS_STATS
jobs, real-time statistics help the optimizer generate more optimal plans.
Whereas bulk load operations gather all necessary statistics, real-time statistics augment rather than replace traditional statistics. For this reason, you must continue to gather statistics regularly using DBMS_STATS
, preferably using the AutoTask job.
10.3.3.3.2 How Real-Time Statistics Work
When a DML operation is currently modifying a table, Oracle Database dynamically computes values for the most essential statistics.
Consider a scenario in which a transaction is currently adding tens of thousands of rows to the oe.orders
table. Real-time statistics keep track of the increasing row count as rows are being inserted. If the optimizer performs a hard parse of a new query, then the optimizer can use the real-time statistics to obtain a more accurate cost estimate.
10.3.3.3.3 User Interface for Real-Time Statistics
You can use manage and access real-time statistics through PL/SQL packages, data dictionary views, and hints.
OPTIMIZER_REAL_TIME_STATISTICS Initialization Parameter
When the OPTIMIZER_REAL_TIME_STATISTICS
initialization parameter is set to TRUE
, Oracle Database automatically gathers real-time statistics during conventional DML operations. The default setting is FALSE
, which means real-time statistics are disabled.
DBMS_STATS
By default, DBMS_STATS
subprograms include real-time statistics. You can also specify parameters to include only these statistics.
Table 10-3 Subprograms for Real-Time Statistics
Subprogram | Description |
---|---|
|
These subprograms enable you to export statistics. By default, the |
|
These subprograms enable you to import statistics. By default, the |
|
These subprograms enable you to delete statistics. By default, the |
|
This function compares table statistics from two sources. The statistics always include real-time statistics. |
|
This function compares statistics for a table as of two specified timestamps. The statistics always include real-time statistics. |
Views
If real-time statistics are available, then you can access them using the views in the following table. Note that partition-level statistics are not supported, so the only table-level views show real-time statistics. The DBA_*
views have ALL_*
and USER_*
versions.
Table 10-4 Views for Real-Time Statistics
View | Description |
---|---|
|
This view displays column statistics and histogram information extracted from |
|
This view displays optimizer statistics for the tables accessible to the current user. Real-time statistics are indicated by |
Hints
The NO_GATHER_OPTIMIZER_STATISTICS
hint prevents the collection of real-time statistics.
See Also:
-
Oracle Database PL/SQL Packages and Types Reference to learn about
DBMS_STATS
subprograms -
Oracle Database Reference to learn about the
ALL_TAB_COL_STATISTICS
view -
Oracle Database Licensing Information User Manual for details on whether the real-time statistics feature is supported for different editions and services
10.3.3.3.4 Real-Time Statistics: Example
In this example, a conventional INSERT
statement triggers the collection of real-time statistics.
This example assumes that the sh
user has been granted the DBA role, and you have logged in to the database as sh
. You perform the following steps:
-
Gather statistics for the
sales
table:BEGIN DBMS_STATS.GATHER_TABLE_STATS('SH', 'SALES', METHOD_OPT=>'FOR ALL COLUMNS SIZE 2'); END; /
-
Query the column-level statistics for
sales
:SET PAGESIZE 5000 SET LINESIZE 200 COL COLUMN_NAME FORMAT a13 COL LOW_VALUE FORMAT a14 COL HIGH_VALUE FORMAT a14 COL NOTES FORMAT a5 COL PARTITION_NAME FORMAT a13 SELECT COLUMN_NAME, LOW_VALUE, HIGH_VALUE, SAMPLE_SIZE, NOTES FROM USER_TAB_COL_STATISTICS WHERE TABLE_NAME = 'SALES' ORDER BY 1, 5;
The
Notes
fields are blank, meaning that real-time statistics have not been gathered:COLUMN_NAME LOW_VALUE HIGH_VALUE SAMPLE_SIZE NOTES ------------- -------------- -------------- ----------- ----- AMOUNT_SOLD C10729 C2125349 5594 CHANNEL_ID C103 C10A 918843 CUST_ID C103 C30B0B 5595 PROD_ID C10E C20231 5593 PROMO_ID C122 C20A64 918843 QUANTITY_SOLD C102 C102 5593 TIME_ID 77C60101010101 78650C1F010101 5593 7 rows selected.
-
Query the table-level statistics for
sales
:SELECT NVL(PARTITION_NAME, 'GLOBAL') PARTITION_NAME, NUM_ROWS, BLOCKS, NOTES FROM USER_TAB_STATISTICS WHERE TABLE_NAME = 'SALES' ORDER BY 1, 4;
The
Notes
fields are blank, meaning that real-time statistics have not been gathered:PARTITION_NAM NUM_ROWS BLOCKS NOTES ------------- ---------- ---------- ----- GLOBAL 918843 3315 SALES_1995 0 0 SALES_1996 0 0 SALES_H1_1997 0 0 SALES_H2_1997 0 0 SALES_Q1_1998 43687 162 SALES_Q1_1999 64186 227 SALES_Q1_2000 62197 222 SALES_Q1_2001 60608 222 SALES_Q1_2002 0 0 SALES_Q1_2003 0 0 SALES_Q2_1998 35758 132 SALES_Q2_1999 54233 187 SALES_Q2_2000 55515 197 SALES_Q2_2001 63292 227 SALES_Q2_2002 0 0 SALES_Q2_2003 0 0 SALES_Q3_1998 50515 182 SALES_Q3_1999 67138 232 SALES_Q3_2000 58950 212 SALES_Q3_2001 65769 242 SALES_Q3_2002 0 0 SALES_Q3_2003 0 0 SALES_Q4_1998 48874 192 SALES_Q4_1999 62388 217 SALES_Q4_2000 55984 202 SALES_Q4_2001 69749 260 SALES_Q4_2002 0 0 SALES_Q4_2003 0 0 29 rows selected.
-
Load 918,843 rows into
sales
by using a conventionalINSERT
statement:INSERT INTO sales(prod_id, cust_id, time_id, channel_id, promo_id, quantity_sold, amount_sold) SELECT prod_id, cust_id, time_id, channel_id, promo_id, quantity_sold * 2, amount_sold * 2 FROM sales; COMMIT;
-
Obtain the execution plan from the cursor:
SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(format=>'TYPICAL'));
The plan shows
LOAD TABLE CONVENTIONAL
in Step 1 andOPTIMIZER STATISTICS GATHERING
in Step 2, which means that the database gathered real-time statistics during the conventional insert:--------------------------------------------------------------------------------------- |Id| Operation | Name|Rows|Bytes|Cost (%CPU)|Time| Pstart|Pstop| --------------------------------------------------------------------------------------- | 0| INSERT STATEMENT | | | |910 (100)| | | | | 1| LOAD TABLE CONVENTIONAL |SALES| | | | | | | | 2| OPTIMIZER STATISTICS GATHERING | |918K| 25M|910 (2)|00:00:01| | | | 3| PARTITION RANGE ALL | |918K| 25M|910 (2)|00:00:01| 1 | 28 | | 4| TABLE ACCESS FULL |SALES|918K| 25M|910 (2)|00:00:01| 1 | 28 | ---------------------------------------------------------------------------------------
-
Query the column-level statistics for
sales
.SET PAGESIZE 5000 SET LINESIZE 200 COL COLUMN_NAME FORMAT a13 COL LOW_VALUE FORMAT a14 COL HIGH_VALUE FORMAT a14 COL NOTES FORMAT a25 COL PARTITION_NAME FORMAT a13 SELECT COLUMN_NAME, LOW_VALUE, HIGH_VALUE, SAMPLE_SIZE, NOTES FROM USER_TAB_COL_STATISTICS WHERE TABLE_NAME = 'SALES' ORDER BY 1, 5;
Now the
Notes
fields showSTATS_ON_CONVENTIONAL_DML
, meaning that the database gathered real-time statistics during the insert:COLUMN_NAME LOW_VALUE HIGH_VALUE SAMPLE_SIZE NOTES ------------- -------------- -------------- ----------- ------------------------- AMOUNT_SOLD C10729 C224422D 9073 STATS_ON_CONVENTIONAL_DML AMOUNT_SOLD C10729 C2125349 5702 CHANNEL_ID C103 C10A 9073 STATS_ON_CONVENTIONAL_DML CHANNEL_ID C103 C10A 918843 CUST_ID C103 C30B0B 9073 STATS_ON_CONVENTIONAL_DML CUST_ID C103 C30B0B 5702 PROD_ID C10E C20231 9073 STATS_ON_CONVENTIONAL_DML PROD_ID C10E C20231 5701 PROMO_ID C122 C20A64 9073 STATS_ON_CONVENTIONAL_DML PROMO_ID C122 C20A64 918843 QUANTITY_SOLD C102 C103 9073 STATS_ON_CONVENTIONAL_DML QUANTITY_SOLD C102 C102 5701 TIME_ID 77C60101010101 78650C1F010101 9073 STATS_ON_CONVENTIONAL_DML TIME_ID 77C60101010101 78650C1F010101 5701
The sample size is 9073, which is roughly 1% of the 918,843 rows inserted. In
QUANTITY_SOLD
andAMOUNT_SOLD
, the high and low values combine the manually gathered statistics and the real-time statistics. -
Force the database to write optimizer statistics to the data dictionary.
EXEC DBMS_STATS.FLUSH_DATABASE_MONITORING_INFO;
-
Query the table-level statistics for
sales
.SELECT NVL(PARTITION_NAME, 'GLOBAL') PARTITION_NAME, NUM_ROWS, BLOCKS, NOTES FROM USER_TAB_STATISTICS WHERE TABLE_NAME = 'SALES' ORDER BY 1, 4;
The
Notes
field shows that real-time statistics have been gathered at the global level, showing the number of rows as 1,837,686:PARTITION_NAM NUM_ROWS BLOCKS NOTES ------------- ---------- ---------- ------------------------- GLOBAL 1837686 3315 STATS_ON_CONVENTIONAL_DML GLOBAL 918843 3315 SALES_1995 0 0 SALES_1996 0 0 SALES_H1_1997 0 0 SALES_H2_1997 0 0 SALES_Q1_1998 43687 162 SALES_Q1_1999 64186 227 SALES_Q1_2000 62197 222 SALES_Q1_2001 60608 222 SALES_Q1_2002 0 0 SALES_Q1_2003 0 0 SALES_Q2_1998 35758 132 SALES_Q2_1999 54233 187 SALES_Q2_2000 55515 197 SALES_Q2_2001 63292 227 SALES_Q2_2002 0 0 SALES_Q2_2003 0 0 SALES_Q3_1998 50515 182 SALES_Q3_1999 67138 232 SALES_Q3_2000 58950 212 SALES_Q3_2001 65769 242 SALES_Q3_2002 0 0 SALES_Q3_2003 0 0 SALES_Q4_1998 48874 192 SALES_Q4_1999 62388 217 SALES_Q4_2000 55984 202 SALES_Q4_2001 69749 260 SALES_Q4_2002 0 0 SALES_Q4_2003 0 0
-
Query the
quantity_sold
column:SELECT COUNT(*) FROM sales WHERE quantity_sold > 50;
-
Obtain the execution plan from the cursor:
SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(format=>'TYPICAL'));
The
Note
field shows that the query used the real-time statistics.Plan hash value: 3519235612 ------------------------------------------------------------------------ |Id| Operation |Name|Rows|Bytes|Cost (%CPU)|Time|Pstart|Pstop| ------------------------------------------------------------------------ | 0| SELECT STATEMENT | | | |921 (100)| | | | | 1| SORT AGGREGATE | | 1| 3| | | | | | 2| PARTITION RANGE ALL| | 1| 3|921 (3)|00:00:01| 1 | 28 | |*3| TABLE ACCESS FULL | SALES | 1| 3|921 (3)|00:00:01| 1 | 28 | ------------------------------------------------------------------------ Predicate Information (identified by operation id): --------------------------------------------------- 3 - filter("QUANTITY_SOLD">50) Note ----- - dynamic statistics used: stats for conventional DML
See Also:
Oracle Database Reference to learn about USER_TAB_COL_STATISTICS
and USER_TAB_STATISTICS
.
10.4 When the Database Gathers Optimizer Statistics
The database collects optimizer statistics at various times and from various sources.
10.4.1 Sources for Optimizer Statistics
The optimizer uses several different sources for optimizer statistics.
The sources are as follows:
-
DBMS_STATS
execution, automatic or manualThis PL/SQL package is the primary means of gathering optimizer statistics.
-
SQL compilation
During SQL compilation, the database can augment the statistics previously gathered by
DBMS_STATS
. In this stage, the database runs additional queries to obtain more accurate information on how many rows in the tables satisfy theWHERE
clause predicates in the SQL statement. -
SQL execution
During execution, the database can further augment previously gathered statistics. In this stage, Oracle Database collects the number of rows produced by every row source during the execution of a SQL statement. At the end of execution, the optimizer determines whether the estimated number of rows is inaccurate enough to warrant reparsing at the next statement execution. If the cursor is marked for reparsing, then the optimizer uses actual row counts from the previous execution instead of estimates.
-
SQL profiles
A SQL profile is a collection of auxiliary statistics on a query. The profile stores these supplemental statistics in the data dictionary. The optimizer uses SQL profiles during optimization to determine the most optimal plan.
The database stores optimizer statistics in the data dictionary and updates or replaces them as needed. You can query statistics in data dictionary views.
See Also:
-
Oracle Database PL/SQL Packages and Types Reference to learn about the
DBMS_STATS.GATHER_TABLE_STATS
procedure
10.4.2 SQL Plan Directives
A SQL plan directive is additional information and instructions that the optimizer can use to generate a more optimal plan.
The directive is a “note to self” by the optimizer that it is misestimating cardinalities of certain types of predicates, and also a reminder to DBMS_STATS
to gather statistics needed to correct the misestimates in the future. For example, when joining two tables that have a data skew in their join columns, a SQL plan directive can direct the optimizer to use dynamic statistics to obtain a more accurate join cardinality estimate.
10.4.2.1 When the Database Creates SQL Plan Directives
The database creates SQL plan directives automatically based on information learned during automatic reoptimization. If a cardinality misestimate occurs during SQL execution, then the database creates SQL plan directives.
For each new directive, the DBA_SQL_PLAN_DIRECTIVES.STATE
column shows the value USABLE
. This value indicates that the database can use the directive to correct misestimates.
The optimizer defines a SQL plan directive on a query expression, for example, filter predicates on two columns being used together. A directive is not tied to a specific SQL statement or SQL ID. For this reason, the optimizer can use directives for statements that are not identical. For example, directives can help the optimizer with queries that use similar patterns, such as queries that are identical except for a select list item.
The Notes section of the execution plan indicates the number of SQL plan directives used for a statement. Obtain more information about the directives by querying the DBA_SQL_PLAN_DIRECTIVES
and DBA_SQL_PLAN_DIR_OBJECTS
views.
See Also:
Oracle Database Reference to learn more about DBA_SQL_PLAN_DIRECTIVES
10.4.2.2 How the Database Uses SQL Plan Directives
When compiling a SQL statement, if the optimizer sees a directive, then it obeys the directive by gathering additional information.
The optimizer uses directives in the following ways:
-
Dynamic statistics
The optimizer uses dynamic statistics whenever it does not have sufficient statistics corresponding to the directive. For example, the cardinality estimates for a query whose predicate contains a specific pair of columns may be significantly wrong. A SQL plan directive indicates that the whenever a query that contains these columns is parsed, the optimizer needs to use dynamic sampling to avoid a serious cardinality misestimate.
Dynamic statistics have some performance overhead. Every time the optimizer hard parses a query to which a dynamic statistics directive applies, the database must perform the extra sampling.
Starting in Oracle Database 12c Release 2 (12.2), the database writes statistics from adaptive dynamic sampling to the SQL plan directives store, making them available to other queries.
-
Column groups
The optimizer examines the query corresponding to the directive. If there is a missing column group, and if the
DBMS_STATS
preferenceAUTO_STAT_EXTENSIONS
is set toON
(the default isOFF
) for the affected table, then the optimizer automatically creates this column group the next timeDBMS_STATS
gathers statistics on the table. Otherwise, the optimizer does not automatically create the column group.If a column group exists, then the next time this statement executes, the optimizer uses the column group statistics in place of the SQL plan directive when possible (equality predicates,
GROUP BY
, and so on). In subsequent executions, the optimizer may create additional SQL plan directives to address other problems in the plan, such as join orGROUP BY
cardinality misestimates.Note:
Currently, the optimizer monitors only column groups. The optimizer does not create an extension on expressions.
When the problem that occasioned a directive is solved, either because a better directive exists or because a histogram or extension exists, the DBA_SQL_PLAN_DIRECTIVES.STATE
value changes from USABLE
to SUPERSEDED
. More information about the directive state is exposed in the DBA_SQL_PLAN_DIRECTIVES.NOTES
column.
See Also:
-
Oracle Database PL/SQL Packages and Types Reference to learn more about the
AUTO_STAT_EXTENSIONS
preference forDBMS_STATS.SET_TABLE_STATS
10.4.2.3 SQL Plan Directive Maintenance
The database automatically creates SQL plan directives. You cannot create them manually.
The database initially creates directives in the shared pool. The database periodically writes the directives to the SYSAUX
tablespace. The database automatically purges any SQL plan directive that is not used after the specified number of weeks (SPD_RETENTION_WEEKS
), which is 53 by default.
You can manage directives by using the DBMS_SPD
package. For example, you can:
-
Enable and disable SQL plan directives (
ALTER_SQL_PLAN_DIRECTIVE
) -
Change the retention period for SQL plan directives (
SET_PREFS
) -
Export a directive to a staging table (
PACK_STGTAB_DIRECTIVE
) -
Drop a directive (
DROP_SQL_PLAN_DIRECTIVE
) -
Force the database to write directives to disk (
FLUSH_SQL_PLAN_DIRECTIVE
)
See Also:
Oracle Database PL/SQL Packages and Types Reference to learn about the DBMS_SPD
package
10.4.2.4 How the Optimizer Uses SQL Plan Directives: Example
This example shows how the database automatically creates and uses SQL plan directives for SQL statements.
Assumptions
You plan to run queries against the sh
schema, and you have privileges on this schema and on data dictionary and V$
views.
To see how the database uses a SQL plan directive:
-
Query the
sh.customers
table.SELECT /*+gather_plan_statistics*/ * FROM customers WHERE cust_state_province='CA' AND country_id='US';
The
gather_plan_statistics
hint shows the actual number of rows returned from each operation in the plan. Thus, you can compare the optimizer estimates with the actual number of rows returned. -
Query the plan for the preceding query.
The following example shows the execution plan (sample output included):
SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(FORMAT=>'ALLSTATS LAST')); PLAN_TABLE_OUTPUT ------------------------------------- SQL_ID b74nw722wjvy3, child number 0 ------------------------------------- select /*+gather_plan_statistics*/ * from customers where CUST_STATE_PROVINCE='CA' and country_id='US' Plan hash value: 1683234692 --------------------------------------------------------------------------------------- | Id| Operation | Name |Starts|E-Rows|A-Rows| Time | Buffers| Reads | --------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | 29 |00:00:00.01 | 17 | 14 | |*1 | TABLE ACCESS FULL| CUSTOMERS | 1 | 8 | 29 |00:00:00.01 | 17 | 14 | --------------------------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter(("CUST_STATE_PROVINCE"='CA' AND "COUNTRY_ID"='US'))
The actual number of rows (
A-Rows
) returned by each operation in the plan varies greatly from the estimates (E-Rows
). This statement is a candidate for automatic reoptimization. -
Check whether the
customers
query can be reoptimized.The following statement queries the
V$SQL.IS_REOPTIMIZABLE
value (sample output included):SELECT SQL_ID, CHILD_NUMBER, SQL_TEXT, IS_REOPTIMIZABLE FROM V$SQL WHERE SQL_TEXT LIKE 'SELECT /*+gather_plan_statistics*/%'; SQL_ID CHILD_NUMBER SQL_TEXT I ------------- ------------ ----------- - b74nw722wjvy3 0 select /*+g Y ather_plan_ statistics* / * from cu stomers whe re CUST_STA TE_PROVINCE ='CA' and c ountry_id=' US'
The
IS_REOPTIMIZABLE
column is markedY
, so the database will perform a hard parse of thecustomers
query on the next execution. The optimizer uses the execution statistics from this initial execution to determine the plan. The database persists the information learned from reoptimization as a SQL plan directive. -
Display the directives for the
sh
schema.The following example uses
DBMS_SPD
to write the SQL plan directives to disk, and then shows the directives for thesh
schema only:EXEC DBMS_SPD.FLUSH_SQL_PLAN_DIRECTIVE; SELECT TO_CHAR(d.DIRECTIVE_ID) dir_id, o.OWNER AS "OWN", o.OBJECT_NAME AS "OBJECT", o.SUBOBJECT_NAME col_name, o.OBJECT_TYPE, d.TYPE, d.STATE, d.REASON FROM DBA_SQL_PLAN_DIRECTIVES d, DBA_SQL_PLAN_DIR_OBJECTS o WHERE d.DIRECTIVE_ID=o.DIRECTIVE_ID AND o.OWNER IN ('SH') ORDER BY 1,2,3,4,5; DIR_ID OW OBJECT COL_NAME OBJECT TYPE STATE REASON ------------------- -- --------- ---------- ------ ------------- ------ ------------ 1484026771529551585 SH CUSTOMERS COUNTRY_ID COLUMN DYNAMIC_SAMPL USABLE SINGLE TABLE CARDINALITY MISESTIMATE 1484026771529551585 SH CUSTOMERS CUST_STATE COLUMN DYNAMIC_SAMPL USABLE SINGLE TABLE _PROVINCE CARDINALITY MISESTIMATE 1484026771529551585 SH CUSTOMERS TABLE DYNAMIC_SAMPL USABLE SINGLE TABLE CARDINALITY MISESTIMATE
Initially, the database stores SQL plan directives in memory, and then writes them to disk every 15 minutes. Thus, the preceding example calls
DBMS_SPD.FLUSH_SQL_PLAN_DIRECTIVE
to force the database to write the directives to theSYSAUX
tablespace.Monitor directives using the views
DBA_SQL_PLAN_DIRECTIVES
andDBA_SQL_PLAN_DIR_OBJECTS
. Three entries appear in the views, one for thecustomers
table itself, and one for each of the correlated columns. Because thecustomers
query has theIS_REOPTIMIZABLE
value ofY
, if you reexecute the statement, then the database will hard parse it again, and then generate a plan based on the previous execution statistics. -
Query the
customers
table again.For example, enter the following statement:
SELECT /*+gather_plan_statistics*/ * FROM customers WHERE cust_state_province='CA' AND country_id='US';
-
Query the plan in the cursor.
The following example shows the execution plan (sample output included):
SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(FORMAT=>'ALLSTATS LAST')); PLAN_TABLE_OUTPUT ------------------------------------- SQL_ID b74nw722wjvy3, child number 1 ------------------------------------- select /*+gather_plan_statistics*/ * from customers where CUST_STATE_PROVINCE='CA' and country_id='US' Plan hash value: 1683234692 ------------------------------------------------------------------------ |Id| Operation |Name |Start|E-Rows|A-Rows| A-Time |Buffers| ------------------------------------------------------------------------ | 0| SELECT STATEMENT | | 1| | 29|00:00:00.01| 17| |*1| TABLE ACCESS FULL|CUSTOMERS| 1| 29| 29|00:00:00.01| 17| ------------------------------------------------------------------------ Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter(("CUST_STATE_PROVINCE"='CA' AND "COUNTRY_ID"='US')) Note ----- - cardinality feedback used for this statement
The
Note
section indicates that the database used reoptimization for this statement. The estimated number of rows (E-Rows
) is now correct. The SQL plan directive has not been used yet. -
Query the cursors for the
customers
query.For example, run the following query (sample output included):
SELECT SQL_ID, CHILD_NUMBER, SQL_TEXT, IS_REOPTIMIZABLE FROM V$SQL WHERE SQL_TEXT LIKE 'SELECT /*+gather_plan_statistics*/%'; SQL_ID CHILD_NUMBER SQL_TEXT I ------------- ------------ ----------- - b74nw722wjvy3 0 select /*+g Y ather_plan_ statistics* / * from cu stomers whe re CUST_STA TE_PROVINCE ='CA' and c ountry_id=' US' b74nw722wjvy3 1 select /*+g N ather_plan_ statistics* / * from cu stomers whe re CUST_STA TE_PROVINCE ='CA' and c ountry_id=' US'
A new plan exists for the
customers
query, and also a new child cursor. -
Confirm that a SQL plan directive exists and is usable for other statements.
For example, run the following query, which is similar but not identical to the original
customers
query (the state isMA
instead ofCA
):SELECT /*+gather_plan_statistics*/ CUST_EMAIL FROM CUSTOMERS WHERE CUST_STATE_PROVINCE='MA' AND COUNTRY_ID='US';
-
Query the plan in the cursor.
The following statement queries the cursor (sample output included).:
SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(FORMAT=>'ALLSTATS LAST')); PLAN_TABLE_OUTPUT ------------------------------------- SQL_ID 3tk6hj3nkcs2u, child number 0 ------------------------------------- Select /*+gather_plan_statistics*/ cust_email From customers Where cust_state_province='MA' And country_id='US' Plan hash value: 1683234692 ---------------------------------------------------------------------- |Id | Operation | Name |Starts|E-Rows|A-Rows|A-Time|Buffers| ---------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | | 2 |00:00:00.01| 16 | |*1 | TABLE ACCESS FULL| CUSTOMERS | 1 | 2 | 2 |00:00:00.01| 16 | ---------------------------------------------------------------------- Predicate Information (identified by operation id): --------------------------------------------------- 1 - filter(("CUST_STATE_PROVINCE"='MA' AND "COUNTRY_ID"='US')) Note ----- - dynamic sampling used for this statement (level=2) - 1 Sql Plan Directive used for this statement
The
Note
section of the plan shows that the optimizer used the SQL directive for this statement, and also used dynamic statistics.
See Also:
-
Oracle Database Reference to learn about
DBA_SQL_PLAN_DIRECTIVES
,V$SQL
, and other database views -
Oracle Database Reference to learn about
DBMS_SPD
10.4.2.5 How the Optimizer Uses Extensions and SQL Plan Directives: Example
The example shows how the database uses a SQL plan directive until the optimizer verifies that an extension exists and the statistics are applicable.
At this point, the directive changes its status to SUPERSEDED
. Subsequent compilations use the statistics instead of the directive.
Assumptions
This example assumes you have already followed the steps in "How the Optimizer Uses SQL Plan Directives: Example".
To see how the optimizer uses an extension and SQL plan directive:
-
Gather statistics for the
sh.customers
table.For example, execute the following PL/SQL program:
BEGIN DBMS_STATS.GATHER_TABLE_STATS('SH','CUSTOMERS'); END; /
-
Check whether an extension exists on the
customers
table.For example, execute the following query (sample output included):
SELECT TABLE_NAME, EXTENSION_NAME, EXTENSION FROM DBA_STAT_EXTENSIONS WHERE OWNER='SH' AND TABLE_NAME='CUSTOMERS'; TABLE_NAM EXTENSION_NAME EXTENSION --------- ------------------------------ ----------------------- CUSTOMERS SYS_STU#S#WF25Z#QAHIHE#MOFFMM_ ("CUST_STATE_PROVINCE", "COUNTRY_ID")
The preceding output indicates that a column group extension exists on the
cust_state_province
andcountry_id
columns. -
Query the state of the SQL plan directive.
Example 10-6 queries the data dictionary for information about the directive.
Although column group statistics exist, the directive has a state of
USABLE
because the database has not yet recompiled the statement. During the next compilation, the optimizer verifies that the statistics are applicable. If they are applicable, then the status of the directive changes toSUPERSEDED
. Subsequent compilations use the statistics instead of the directive. -
Query the
sh.customers
table.SELECT /*+gather_plan_statistics*/ * FROM customers WHERE cust_state_province='CA' AND country_id='US';
-
Query the plan in the cursor.
Example 10-7 shows the execution plan (sample output included).
The
Note
section shows that the optimizer used the directive and not the extended statistics. During the compilation, the database verified the extended statistics. -
Query the state of the SQL plan directive.
Example 10-8 queries the data dictionary for information about the directive.
The state of the directive, which has changed to
SUPERSEDED
, indicates that the corresponding column or groups have an extension or histogram, or that another SQL plan directive exists that can be used for the directive. -
Query the
sh.customers
table again, using a slightly different form of the statement.For example, run the following query:
SELECT /*+gather_plan_statistics*/ /* force reparse */ * FROM customers WHERE cust_state_province='CA' AND country_id='US';
If the cursor is in the shared SQL area, then the database typically shares the cursor. To force a reparse, this step changes the SQL text slightly by adding a comment.
-
Query the plan in the cursor.
Example 10-9 shows the execution plan (sample output included).
The absence of a
Note
shows that the optimizer used the extended statistics instead of the SQL plan directive. If the directive is not used for 53 weeks, then the database automatically purges it.
See Also:
-
Oracle Database Reference to learn about
DBA_SQL_PLAN_DIRECTIVES
,V$SQL
, and other database views -
Oracle Database Reference to learn about
DBMS_SPD
Example 10-6 Display Directives for sh Schema
EXEC DBMS_SPD.FLUSH_SQL_PLAN_DIRECTIVE;
SELECT TO_CHAR(d.DIRECTIVE_ID) dir_id, o.OWNER, o.OBJECT_NAME,
o.SUBOBJECT_NAME col_name, o.OBJECT_TYPE, d.TYPE, d.STATE, d.REASON
FROM DBA_SQL_PLAN_DIRECTIVES d, DBA_SQL_PLAN_DIR_OBJECTS o
WHERE d.DIRECTIVE_ID=o.DIRECTIVE_ID
AND o.OWNER IN ('SH')
ORDER BY 1,2,3,4,5;
DIR_ID OWN OBJECT_NA COL_NAME OBJECT TYPE STATE REASON
------------------- --- --------- ---------- ------- ---------------- ------ ------------
1484026771529551585 SH CUSTOMERS COUNTRY_ID COLUMN DYNAMIC_SAMPLING USABLE SINGLE TABLE
CARDINALITY
MISESTIMATE
1484026771529551585 SH CUSTOMERS CUST_STATE_ COLUMN DYNAMIC_SAMPLING USABLE SINGLE TABLE
PROVINCE CARDINALITY
MISESTIMATE
1484026771529551585 SH CUSTOMERS TABLE DYNAMIC_SAMPLING USABLE SINGLE TABLE
CARDINALITY
MISESTIMATE
Example 10-7 Execution Plan
SQL> SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(FORMAT=>'ALLSTATS LAST'));
PLAN_TABLE_OUTPUT
-------------------------------------
SQL_ID b74nw722wjvy3, child number 0
-------------------------------------
select /*+gather_plan_statistics*/ * from customers where
CUST_STATE_PROVINCE='CA' and country_id='US'
Plan hash value: 1683234692
-----------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
-----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 29 |00:00:00.01 | 16 |
|* 1 | TABLE ACCESS FULL| CUSTOMERS | 1 | 29 | 29 |00:00:00.01 | 16 |
-----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(("CUST_STATE_PROVINCE"='CA' AND "COUNTRY_ID"='US'))
Note
-----
- dynamic sampling used for this statement (level=2)
- 1 Sql Plan Directive used for this statement
Example 10-8 Display Directives for sh Schema
EXEC DBMS_SPD.FLUSH_SQL_PLAN_DIRECTIVE;
SELECT TO_CHAR(d.DIRECTIVE_ID) dir_id, o.OWNER, o.OBJECT_NAME,
o.SUBOBJECT_NAME col_name, o.OBJECT_TYPE, d.TYPE, d.STATE, d.REASON
FROM DBA_SQL_PLAN_DIRECTIVES d, DBA_SQL_PLAN_DIR_OBJECTS o
WHERE d.DIRECTIVE_ID=o.DIRECTIVE_ID
AND o.OWNER IN ('SH')
ORDER BY 1,2,3,4,5;
DIR_ID OWN OBJECT_NA COL_NAME OBJECT TYPE STATE REASON
------------------- --- --------- ---------- ------ -------- --------- ------------
1484026771529551585 SH CUSTOMERS COUNTRY_ID COLUMN DYNAMIC_ SUPERSEDED SINGLE TABLE
SAMPLING CARDINALITY
MISESTIMATE
1484026771529551585 SH CUSTOMERS CUST_STATE_ COLUMN DYNAMIC_ SUPERSEDED SINGLE TABLE
PROVINCE SAMPLING CARDINALITY
MISESTIMATE
1484026771529551585 SH CUSTOMERS TABLE DYNAMIC_ SUPERSEDED SINGLE TABLE
SAMPLING CARDINALITY
MISESTIMATE
Example 10-9 Execution Plan
SQL> SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY_CURSOR(FORMAT=>'ALLSTATS LAST'));
PLAN_TABLE_OUTPUT
-------------------------------------
SQL_ID b74nw722wjvy3, child number 0
-------------------------------------
select /*+gather_plan_statistics*/ * from customers where
CUST_STATE_PROVINCE='CA' and country_id='US'
Plan hash value: 1683234692
-----------------------------------------------------------------------------------------
| Id | Operation | Name | Starts | E-Rows | A-Rows | A-Time | Buffers |
-----------------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | | 29 |00:00:00.01 | 17 |
|* 1 | TABLE ACCESS FULL| CUSTOMERS | 1 | 29 | 29 |00:00:00.01 | 17 |
-----------------------------------------------------------------------------------------
Predicate Information (identified by operation id):
---------------------------------------------------
1 - filter(("CUST_STATE_PROVINCE"='CA' AND "COUNTRY_ID"='US'))
19 rows selected.
10.4.3 When the Database Samples Data
Starting in Oracle Database 12c, the optimizer automatically decides whether dynamic statistics are useful and which sample size to use for all SQL statements.
Note:
In earlier releases, dynamic statistics were called dynamic sampling.
The primary factor in the decision to use dynamic statistics is whether available statistics are sufficient to generate an optimal plan. If statistics are insufficient, then the optimizer uses dynamic statistics.
Automatic dynamic statistics are enabled when the OPTIMIZER_DYNAMIC_SAMPLING
initialization parameter is not set to 0
. By default, the dynamic statistics level is set to 2
.
In general, the optimizer uses default statistics rather than dynamic statistics to compute statistics needed during optimizations on tables, indexes, and columns. The optimizer decides whether to use dynamic statistics based on several factors, including the following:
-
The SQL statement uses parallel execution.
-
A SQL plan directive exists.
The following diagram illustrates the process of gathering dynamic statistics.
As shown in Figure 10-2, the optimizer automatically gathers dynamic statistics in the following cases:
-
Missing statistics
When tables in a query have no statistics, the optimizer gathers basic statistics on these tables before optimization. Statistics can be missing because the application creates new objects without a follow-up call to
DBMS_STATS
to gather statistics, or because statistics were locked on an object before statistics were gathered.In this case, the statistics are not as high-quality or as complete as the statistics gathered using the
DBMS_STATS
package. This trade-off is made to limit the impact on the compile time of the statement. -
Insufficient statistics
Statistics can be insufficient whenever the optimizer estimates the selectivity of predicates (filter or join) or the
GROUP BY
clause without taking into account correlation between columns, skew in the column data distribution, statistics on expressions, and so on.Extended statistics help the optimizer obtain accurate quality cardinality estimates for complex predicate expressions. The optimizer can use dynamic statistics to compensate for the lack of extended statistics or when it cannot use extended statistics, for example, for non-equality predicates.
Note:
The database does not use dynamic statistics for queries that contain the AS OF
clause.
See Also:
-
Oracle Database Reference to learn about the
OPTIMIZER_DYNAMIC_SAMPLING
initialization parameter
10.4.4 How the Database Samples Data
At the beginning of optimization, when deciding whether a table is a candidate for dynamic statistics, the optimizer checks for the existence of persistent SQL plan directives on the table.
For each directive, the optimizer registers a statistics expression that the optimizer computes when determining the cardinality of a predicate involving the table. In Figure 10-2, the database issues a recursive SQL statement to scan a small random sample of the table blocks. The database applies the relevant single-table predicates and joins to estimate predicate cardinalities.
The database persists the results of dynamic statistics as sharable statistics. The database can share the results during the SQL compilation of one query with recompilations of the same query. The database can also reuse the results for queries that have the same patterns.
See Also:
-
"Configuring Options for Dynamic Statistics" to learn how to set the dynamic statistics level
-
Oracle Database Reference for details about the
OPTIMIZER_DYNAMIC_SAMPLING
initialization parameter