34.3 Use Nested Data
A join between the tables for one-to-many relationship is represented through nested columns.
Oracle Machine Learning for SQL requires a case table in single-record case format, with each record in a separate row. What if some or all of your data is in multi-record case format, with each record in several rows? What if you want one attribute to represent a series or collection of values, such as a student's test scores or the products purchased by a customer?
This kind of one-to-many relationship is usually implemented as a join between tables. For example, you can join your customer table to a sales table and thus associate a list of products purchased with each customer.
OML4SQL supports dimensioned data through nested columns. To include dimensioned data in your case table, create a view and cast the joined data to one of the machine learning nested table types. Each row in the nested column consists of an attribute name/value pair. OML4SQL internally processes each nested row as a separate attribute.
Note:
O-Cluster is the only algorithm that does not support nested data.
Related Topics
34.3.1 Nested Object Types
Nested tables are object data types that can be used in place of other data types.
Oracle Database supports user-defined data types that make it possible to model real-world entities as objects in the database. Collection types are object data types for modeling multi-valued attributes. Nested tables are collection types. Nested tables can be used anywhere that other data types can be used.
OML4SQL supports the following nested object types:
DM_NESTED_BINARY_DOUBLES
DM_NESTED_BINARY_FLOATS
DM_NESTED_NUMERICALS
DM_NESTED_CATEGORICALS
Descriptions of the nested types are provided in this example.
Example 34-3 OML4SQL Nested Data Types
describe dm_nested_binary_double Name Null? Type ----------------------------------------- -------- ---------------------------- ATTRIBUTE_NAME VARCHAR2(4000) VALUE BINARY_DOUBLE describe dm_nested_binary_doubles DM_NESTED_BINARY_DOUBLES TABLE OF SYS.DM_NESTED_BINARY_DOUBLE Name Null? Type ------------------------------------------ -------- --------------------------- ATTRIBUTE_NAME VARCHAR2(4000) VALUE BINARY_DOUBLE describedm_nested_binary_float
Name Null? Type ----------------------------------------- -------- --------------------------- ATTRIBUTE_NAME VARCHAR2(4000) VALUE BINARY_FLOAT describedm_nested_binary_floats
DM_NESTED_BINARY_FLOATS TABLE OF SYS.DM_NESTED_BINARY_FLOAT Name Null? Type ----------------------------------------- -------- ---------------------------- ATTRIBUTE_NAME VARCHAR2(4000) VALUE BINARY_FLOAT describedm_nested_numerical
Name Null? Type ----------------------------------------- -------- ---------------------------- ATTRIBUTE_NAME VARCHAR2(4000) VALUE NUMBER describedm_nested_numericals
DM_NESTED_NUMERICALS TABLE OF SYS.DM_NESTED_NUMERICAL Name Null? Type ----------------------------------------- -------- ---------------------------- ATTRIBUTE_NAME VARCHAR2(4000) VALUE NUMBER describedm_nested_categorical
Name Null? Type ----------------------------------------- -------- ---------------------------- ATTRIBUTE_NAME VARCHAR2(4000) VALUE VARCHAR2(4000) describedm_nested_categoricals
DM_NESTED_CATEGORICALS TABLE OF SYS.DM_NESTED_CATEGORICAL Name Null? Type ----------------------------------------- -------- ---------------------------- ATTRIBUTE_NAME VARCHAR2(4000) VALUE VARCHAR2(4000)
Related Topics
34.3.2 Example: Transforming Transactional Data for Machine Learning
In this example, a comparison is shown for sale of products in four regions with data before transformation and then after transformation.
Example 34-4 shows data from a view of a sales table. It includes sales for three of the many products sold in four regions. This data is not suitable for machine learning at the product level because sales for each case (product), is stored in several rows.
Example 34-5 shows how this data can be transformed for machine learning. The case ID column is PRODUCT
. SALES_PER_REGION
, a nested column of type DM_NESTED_NUMERICALS
, is a data attribute. This table is suitable for machine learning at the product case level, because the information for each case is stored in a single row.
Note:
The presentation in this example is conceptual only. The data is not actually pivoted before being processed.
Example 34-4 Product Sales per Region in Multi-Record Case Format
PRODUCT REGION SALES ------- -------- ---------- Prod1 NE 556432 Prod2 NE 670155 Prod3 NE 3111 . . Prod1 NW 90887 Prod2 NW 100999 Prod3 NW 750437 . . Prod1 SE 82153 Prod2 SE 57322 Prod3 SE 28938 . . Prod1 SW 3297551 Prod2 SW 4972019 Prod3 SW 884923 . .
Example 34-5 Product Sales per Region in Single-Record Case Format
PRODUCT SALES_PER_REGION (ATTRIBUTE_NAME, VALUE) ------ -------------------------- Prod1 ('NE' , 556432) ('NW' , 90887) ('SE' , 82153) ('SW' , 3297551) Prod2 ('NE' , 670155) ('NW' , 100999) ('SE' , 57322) ('SW' , 4972019) Prod3 ('NE' , 3111) ('NW' , 750437) ('SE' , 28938) ('SW' , 884923) . .
Example 34-6 Model Attributes Derived From SALES_PER_REGION
PRODUCT SALES_PER_REGION.NE SALES_PER_REGION.NW SALES_PER_REGION.SE SALES_PER_REGION.SW ------- ------------------ ------------------- ------------------ ------------------- Prod1 556432 90887 82153 3297551 Prod2 670155 100999 57322 4972019 Prod3 3111 750437 28938 884923 . .