News Article Analysis 3.0

Sentiment Analysis

Use Sentiment Analysis technique, to check Sentiment Score of each article.

Then sort all articles by date and present it in table using Pandas.

We can also visualize results in plot chart using MatPlotLib.

Sentiment dictionary

First process the OpinionFinder_Lexicon.tff, calculate the sentiment strength and polarity. Sentiment score of single word can be:

Neural = ignore ; Weak Positive = 1 ; Weak Negative = -1 ; Strong Positive = 3 ; Strong Negative = -3

In [1]:
# prepare Sentiment dictionary
senti = open('OpinionFinder_Lexicon.tff','r').read().splitlines()

sentiment={}

for line in senti:
    tokens = line.split(' ')
    
    if tokens[5] == 'priorpolarity=neutral':
        continue

    term = tokens[2].replace('word1=','')
            
    if tokens[0] == 'type=weaksubj':
        score = 1
    elif tokens[0] == 'type=strongsubj':
        score = 3

    if tokens[5] == 'priorpolarity=negative':
        polarity = -1
    elif tokens[5] == 'priorpolarity=positive':
        polarity = 1
        
    sentiment[term] = polarity * score

Calculate Sentiment

Define a function to calculate score of each word. A net amount of sentiment score will be count.

Optional: may print out sentiment score of each word.

In [2]:
def calculate_sentiment(d):
    tokens = re.findall(r'\w+', d.lower())
    sentiment_score = 0

    for token in tokens:
        if token in sentiment:
            sentiment_score = sentiment_score + sentiment[token]
            #print(token, sentiment[token])
    
    return sentiment_score

Open News Article data saved earlier

Open up news.json (continue from News Article Analysis 1.0).

Recall that all_articles structure as below:

all_articles = [[date, title, content, link],[date, ..., ..., ...],....,....]

In [3]:
# Open JSON file
import json
with open('news.json') as f:
    all_articles = json.load(f)

Sentiment Analysis in Action

First sort all articles by date. Then loop through all_articles and analyse content of each article.

Gather the date and score and save it as a tuple in date_score.

Append date_score into date_score_table list.

Optional: Print out first and last 5 title, score, date and url for checking.

In [4]:
all_articles.sort()  # sort it by date

date_score_table = []
import re

for i in range(len(all_articles)):
    d = all_articles[i][2]                             # content
    sentiment_score = calculate_sentiment(d)           # calculate sentiment of content
    if i < 5 or i > (len(all_articles) -6):            # only print first and last 5 for review
        print(str(i + 1) + ') ' + all_articles[i][1])  # print title
        print('   Sentiment Score = ' + str(sentiment_score) + ' -------- ' + all_articles[i][0]) # print score and date
        print(all_articles[i][3] + '\n')               # print url for checking
    
    date_score = (all_articles[i][0], sentiment_score)
    date_score_table.append(date_score)
1) Challenges for Petronas
   Sentiment Score = 8 -------- 2015-08-22
https://www.thestar.com.my/business/business-news/2015/08/22/challenges-for-petronas

2) Petronas expects 1,000 redundancies under group-wide revamp
   Sentiment Score = -7 -------- 2016-03-01
https://www.thestar.com.my/business/business-news/2016/03/01/petronas-expects-1000-redundancies-under-group-wide-revamp

3) Wee looks forward to his break
   Sentiment Score = -16 -------- 2016-03-03
https://www.thestar.com.my/business/business-news/2016/03/03/wee-looks-forward-to-his-break

4) Petronas Lubricants CEO not ruling out IPO in growth drive
   Sentiment Score = 18 -------- 2016-04-12
https://www.thestar.com.my/business/business-news/2016/04/12/petronas-lubricants-ceo-not-ruling-out-ipo-in-growth-drive

5) End to Petronas-Sarawak controversy
   Sentiment Score = 36 -------- 2016-08-23
https://www.thestar.com.my/business/business-news/2016/08/23/end-to-controversy

267) Samsung 11.11 deals include 65in UHD TV and Gear S3 Frontier smartwatch
   Sentiment Score = 26 -------- 2018-11-09
https://www.thestar.com.my/tech/tech-news/2018/11/09/samsung-1111-deals-include-65in-uhd-tv-going-for-special-price

268) Solar energy generator for 15 orang asli households
   Sentiment Score = 9 -------- 2018-11-09
https://www.thestar.com.my/metro/metro-news/2018/11/09/solar-energy-generator-for-15-orang-asli-households

269) Budget 2019 – the good, the bad and the ugly
   Sentiment Score = 58 -------- 2018-11-10
https://www.thestar.com.my/business/business-news/2018/11/10/budget-2019-the-good-the-bad-and-the-ugly

270) New sales tax may be good but at what cost
   Sentiment Score = 29 -------- 2018-11-10
https://www.thestar.com.my/metro/metro-news/2018/11/10/new-sales-tax-may-be-good-but-at-what-cost

271) Petronas special dividend should be a special payout
   Sentiment Score = -24 -------- 2018-11-10
https://www.thestar.com.my/business/business-news/2018/11/10/petronas-special-dividend-should-be-a-special-payout

Pandas - present date_score results in table form

Use Pandas DataFrame to load date_score_table. Assign column name as date and score.

Save a softcopy in hard drive in csv format.

Optional: print out first and last 5 date & score result

In [5]:
import pandas as pd
import numpy as np
from numpy.random import randn

df = pd.DataFrame(date_score_table, columns=('date', 'score'))

df.to_csv('date_score.csv')

print(df.head(5))
print(df.tail(5))
         date  score
0  2015-08-22      8
1  2016-03-01     -7
2  2016-03-03    -16
3  2016-04-12     18
4  2016-08-23     36
           date  score
266  2018-11-09     26
267  2018-11-09      9
268  2018-11-10     58
269  2018-11-10     29
270  2018-11-10    -24

Matplotlib - visualize in plot chart

Visualize Sentiment Score in time series plot chart. From the chart, we can analyse keyword that we search for. See how often it exists in a Positive articles compare to Negative articles.

Example: The keyword that we search here is Petronas. Although sentiment score is Positive overall, but we can see numbers of Negative increasing recently.

In [6]:
%matplotlib inline

import matplotlib.pyplot as plt
from matplotlib import rcParams

rcParams['figure.figsize'] = 16,8

df = pd.read_csv('date_score.csv', parse_dates=True, index_col=1)
plt.xlabel('Date')
plt.ylabel('Sentiment Score')
plt.title('News Sentiment Score')

df['score'].plot(style=".")
plt.axhline(y=0, color='b', linestyle='-')
plt.show()