16 Performing Sentiment Analysis Using Oracle Text
Sentiment analysis enables you to identify a positive or negative sentiment in a search topic.
This chapter contains the following topics:
16.1 Overview of Sentiment Analysis
Sentiment analysis uses trained sentiment classifiers to provide sentiment information for documents or topics within documents.
This section contains the following topics:
16.1.1 About Sentiment Analysis
Oracle Text enables you to perform sentiment analysis for a topic or document by using sentiment classifiers that are trained to identify sentiment metadata.
With growing amounts of data, organizations must gain more insights about their data rather than just obtaining hits in response to a search query. The insight could be in the form of answering certain basic types of queries (such as weather queries or queries about recent events) or providing opinions about user-specified topics. Keyword searches provide a list of results containing the search term. However, to identify a sentiment or opinion about the search term, must browse through the results and then manually locate the required sentiment information. Sentiment analysis provides a one-step process to identify sentiment information within a set of documents.
Sentiment analysis is the process of identifying and extracting sentiment metadata about a specified topic or entity from a set of documents. Trained sentiment classifiers identify the sentiment. When you run a query with sentiment analysis, in addition to the search results, sentiment metadata is also identified and displayed. Sentiment analysis provides answers to questions such as “Is a product review positive or negative?” or “Is the customer satisfied or dissatisfied?” For example, from a document set consisting of multiple reviews for a particular product, you can determine an overall sentiment that indicates if the product is good or bad.
16.1.2 About Sentiment Classifiers
A sentiment classifier is a type of document classifier that is used to extract sentiment metadata about a topic or document.
To perform sentiment analysis by using a sentiment classifier, you must first associate a sentiment classifier preference with the sentiment classifier and then train the sentiment classifier.
You can associate user-defined sentiment classifiers with a sentiment classifier preference of type SENTIMENT_CLASSIFIER.
A sentiment classifier preference specifies the parameters that are used to train a sentiment classifier. These parameters are defined as attributes of the sentiment classifier preference. You can either create a sentiment classifier preference or use the default CTXSYS.DEFAULT_SENTIMENT_CLASSIFIER.
To create a user-defined sentiment classifier preference, use the CTX_DDL.CREATE_PREFERENCE
procedure to define a sentiment classifier preference and the CTX_DDL.SET_ATTRIBUTE
procedure to define its parameters.
To train a sentiment classifier, you need to provide an associated sentiment classifier preference, a training set of documents, and the sentiment categories. If you do not specify a classifier preference, then Oracle Text uses default values for the training parameters. You train the sentiment classifier by using the set of sample documents and the specified preference. You assign each sample document to a category. Oracle Text uses this sentiment classifier to deduce a set of classification rules that define how sentiment analysis must be performed. Use the CTX_CLS.SA_TRAIN
procedure to train a sentiment classifier.
Typically, you define and train separate sentiment classifiers for different categories of documents, such as finance, product reviews, and music. If you do not want to create your own sentiment classifier or if suitable training data is not available to train your classifier, you can use the default sentiment classifier provided by Oracle Text. The default sentiment classifier is unsupervised.
Note:
The default sentiment classifier works only with AUTO_LEXER.
Do not use AUTO_LEXER
with user-defined sentiment classifiers.
16.1.3 About Performing Sentiment Analysis
To perform sentiment analysis, you run a sentiment query that includes the sentiment classifier which must be used to identify sentiment information. The classifier can be the default or a user-defined sentiment classifier.
You can perform sentiment analysis only as part of a search operation. Oracle Text searches for the specified keywords and generates a result set. Then, sentiment analysis is performed on the result set to identify a sentiment score for each result. If you do not explicitly specify a sentiment classifier in your query, the default classifier is used.
You can either identify one single sentiment for the entire document or separate sentiments for each topic within a document. Most often, a document contains multiple topics and the author’s sentiment toward each topic may be different. In such cases, document-level sentiment scores may not be useful because they cannot identify sentiment scores associated with different topics in the document. Identifying topic-level sentiment scores provides the required answers. For example, when searching through a set of documents containing reviews for a camera, a document-level sentiment tells you whether the camera is good or not. Assume that you want the general opinion about the picture quality of a camera. Performing a topic-level sentiment analysis, with “picture quality” as one of the topics provides the required information.
Note:
If you do not specify a topic of interest for sentiment analysis, then Oracle Text returns the overall sentiment for the entire document.
16.2 Creating a Sentiment Classifier Preference
Use the CTX_DDL.CREATE_PREFERENCE
procedure to create a sentiment classifier preference and the CTX_DDL.SET_ATTRIBUTE
procedure to define its attributes. The classifier type associated with a user-defined sentiment classifier preference is SENTIMENT_CLASSIFIER
.
To create a sentiment classifier preference:
- To define a sentiment classifier preference, use the
CTX_DDL.CREATE_PREFERENCE
procedure. The classifier must be of typeSENTIMENT_CLASSIFIER.
- To define attributes for the sentiment classifier preference, use the
CTX_DDL.SET_ATTRIBUTE
procedure. The attributes define the parameters that are used to train the sentiment classifier.
Example 16-1 Creating a Sentiment Classifier Preference
The following example creates a sentiment classifier preference named clsfier_camera.
This preference is used to classify a set of documents that contain reviews for SLR cameras.
-
Define a sentiment classifier preference named
clsfier_camera
with typeSENTIMENT_CLASSIFIER.
exec ctx_ddl.create_preference('clsfier_camera','SENTIMENT_CLASSIFIER');
-
Define the attributes of the
clsfier_camera
sentiment classifier preference. Set 1000 for the maximum number of features to be extracted. Set 600 for the number of iterations for which the classifier runs.exec ctx_ddl.set_attribute('clsfier_camera','MAX_FEATURES','1000'); exec ctx_ddl.set_attribute('clsfier_camera','NUM_ITERATIONS','600');
For attributes that are not explicitly defined, the default values are used.
16.3 Training Sentiment Classifiers
Training a sentiment classifier generates the classification rules that are used to provide a positive or negative sentiment for a search keyword.
The following example trains a sentiment classifier that can perform sentiment analysis on user reviews of cameras:
16.4 Performing Sentiment Analysis with the CTX_DOC Package
Use the procedures in the CTX_DOC
package to perform sentiment analysis on a single document within a document set. For each document, you can either determine a single sentiment score for the entire document or individual sentiment scores for each topic within the document.
Before you perform sentiment analysis, you must create a context index on the document set. The following command creates a camera_revidx
context index on the document set in the camera_reviews
table:
create index camera_revidx on camera_reviews(review_text) indextype is
ctxsys.context parameters ('lexer mylexer stoplist ctxsys.default_stoplist');
To perform sentiment analysis with the CTX_DOC
package, use one of the following methods:
Example 16-2 Obtaining a Single Sentiment Score for a Document
The following example uses the clsfier_camera
sentiment classifier to provide a single aggregate sentiment score for the entire document. The sentiment classifier was created and trained. The table containing the document set has a camera_revidx
context index. The doc_id
of the document within the document table for which sentiment analysis must be performed is 49. The topic for which a sentiment score is being generated is ‘Nikon.’
select ctx_doc.sentiment_aggregate('camera_revidx','49','Nikon','clsfier_camera') from dual;
CTX_DOC.SENTIMENT_AGGREGATE('CAMERA_REVIDX','49','NIKON','CLSFIER_CAMERA')
--------------------------------------------------------------------------
74
1 row selected.
Example 16-3 Obtaining a Single Sentiment Score with the Default Classifier
The following example uses the default sentiment classifier to provide an aggregate sentiment score for the entire document. The table containing the document set has a camera_revidx
context index. The doc_id
of the document within the document table for which sentiment analysis must be performed is 1.
select ctx_doc.sentiment_aggregate('camera_revidx','1') from dual;
CTX_DOC.SENTIMENT_AGGREGATE('CAMERA_REVIDX','1')
--------------------------------------------
2
1 row selected.
Example 16-4 Obtaining Sentiment Scores for Each Topic Within a Document
The following example uses the clsfier_camera
sentiment classifier to generate sentiment scores for each segment within the document. The sentiment classifier was created and trained. The table containing the document set has a camera_revidx
context index . The doc_id
of the document within the document table for which sentiment analysis must be performed is 49. The topic for which a sentiment score is being generated is ‘Nikon.’ The restab
result table, which will be populated with the analysis results, was created with the columns snippet (CLOB
) and score (NUMBER
).
exec ctx_doc.sentiment('camera_revidx','49','Nikon','restab','clsfier_camera', starttag=>'<<', endtag=>'>>');
SQL> select * from restab;
SNIPPET
--------------------------------------------------------------------------------
SCORE
----------
It took <<Nikon>> a while to produce a superb compact 85mm lens, but this time they finally got it right.
65
Without a doubt, this is a fine portrait lens for photographing head-and-shoulder portraits (The only lens which is optically better is
<<Nikon>>'s legendary 10
5mm f2.5 Nikkor lens, and its close optical twin, the 105mm f2.8 Micro Nikkor.
75
Since the 105mm f2.5 Nikkor lens doesn't have an autofocus version, then this might be the perfect moderate telephoto lens for owners of
<<Nikon>> autofocus
SLR cameras.
84
3 rows selected.
Example 16-5 Obtaining a Sentiment Score for a Topic Within a Document
The following example uses the tdrbrtsent03_cl
sentiment classifier to generate a sentiment score for each segment within the document. The sentiment classifier was created and trained. The table containing the document set has a tdrbrtsent03_idx
context index. The doc_id
of the document within the document table for which sentiment analysis must be performed is 1. The topic for which a sentiment score is being generated is ‘movie.’ The tdrbrtsent03_rtab
result table, which will be populated with the analysis results was created with the columns snippet and score.
SQL> exec ctx_doc.sentiment('tdrbrtsent03_idx','1','movie','tdrbrtsent03_rtab','tdrbrtsent03_cl');
PL/SQL procedure successfully completed.
SQL> select * from tdrbrtsent03_rtab;
SNIPPET
--------------------------------------------------------------------------------
SCORE
----------
the <b>movie</b> is a bit overlong , but nicholson is such good fun that the running time passes by pretty quickly
-62
1 row selected.
See Also:
-
CTX_DOC.SENTIMENT_AGGREGATE
in the Oracle Text Reference -
CTX_DOC.SENTIMENT
in the Oracle Text Reference
16.5 Performing Sentiment Analysis with the RSI
The XML Query Result Set Interface (RSI) enables you to perform sentiment analysis on a set of documents by using either the default sentiment classifier or a user-defined sentiment classifier. The documents on which sentiment analysis must be performed are stored in a document table.
Use the sentiment
element in the input RSI to indicate that sentiment analysis, in addition to other operations specified in the Result Set Descriptor (RSD), must be performed at query time. If you specify a value for the classifier
attribute of the sentiment
element, then the specified sentiment classifier is used to perform the sentiment analysis. If the classifier
attribute is omitted, then Oracle Text performs sentiment analysis by using the default sentiment classifier. The sentiment
element contains a child element called item
that specifies the topic or concept about which a sentiment must be generated during sentiment analysis.
You can generate either a single sentiment score for each document or separate sentiment scores for each topic within the document. Use the agg
attribute of the item
element to generate a single aggregated sentiment score for each document.
You can perform sentiment classification by using a keyword query or the ABOUT
operator. When you use the ABOUT
operator, the result set includes synonyms of the keyword that are identified by using the thesaurus.
To perform sentiment analysis by using RSI:
Example 16-6 Input the RSD to Perform Sentiment Analysis
The following example performs sentiment analysis and generates a sentiment for the ‘lens’ topic. The driving query is a keyword query for ‘camera.’ The sentiment
element specifies that sentiment analysis must be performed by using the clsfier_camera
sentiment classifier. This classifier was previously created and trained by using the CTX_CLS.SA_TRAIN_MODEL
procedure. The camera_revidx
context index is on the document set table.
The sentiment score ranges from -100 to 100. A positive score indicates positive sentiment, whereas a negative score indicates negative sentiment. The absolute value of the score is indicative of the magnitude of positive and negative sentiment.
To perform sentiment analysis and obtain a sentiment score for each topic within the document:
-
Create the
rs
result set table that will store the results of the search operation.SQL> var rs clob; SQL> exec dbms_lob.createtemporary(:rs, TRUE, DBMS_LOB.SESSION);
-
Perform sentiment analysis as part of a search query.
The keyword being searched for is ‘camera.’ The topic for which sentiment analysis is performed is ‘lens.’
begin ctx_query.result_set('camera_revidx','camera',' <ctx_result_set_descriptor> <hitlist start_hit_num="1" end_hit_num="10" order="score desc"> <sentiment classifier="clsfier_camera"> <item topic="lens" /> <item topic="picture quality" agg="true" /> </sentiment> </hitlist> </ctx_result_set_descriptor>',:rs); end; /
-
View the results stored in the result table.
Other applications can use the XML result set for further processing. For brevity, some output was removed. For each segment within the document, a score represents the sentiment score for the segment.
SQL> select xmltype(:rs) from dual; XMLTYPE(:RS) -------------------------------------------------------------------------------- <ctx_result_set> <hitlist> <hit> <sentiment> <item topic="lens"> <segment> <segment_text>The first time it was sent in was because the <b>lens </b> door failed to turn on the camera and it was almost to come off of its track . Eight months later, the flash quit working in all modes AND the door was failing AGAIN!</segment_text> <segment_score>-81</segment_score> </segment> </item> <item topic="picture quality"> <score> -75 </score> </item> </sentiment> </hit> <hit> <sentiment> <item topic="lens"> <segment> <segment_text>I was actually quite impressed with it. Powerful zoom , sharp <b>lens</b>, decent picture quality. I also played with some other Panasonic models in various stores just to get a better feel for them, as well as spent a few hours on </segment_text> <segment_score> 67 </segment_score> </segment> </item> <item topic="picture quality"> <score>-1</score> </item> </sentiment> </hit> . . . . . . </hitlist> </ctx_result_set>
See Also: