News Article Analysis 1.0

Web Scraping


Specify a 'keyword' to search for in Google News. Tool created for this example only design to scrap articles from

Search in Google News will be looks like this: "Petronas"


Specify pages of result from Google News to scrap. Generally one page of Google News result contains 10 articles.

In [1]:
# KEYWORD to search in Google News
keyword = 'Petronas'

# PAGES to download from Google News
pages = 30

Store of results

Result will be save in a all_articles List, that contain List.

all_articles = [[date, title, content, link],[date, ..., ..., ...],....,....]

Adjust keyword casing

Adjust keyword to lower case, standardize casing for text matching and analysis.

In [2]:
# All articles with date, title, content, link will be save in list of list
all_articles = []

# turn keyword into lower case
keyword = keyword.lower()

Import module needed for Web Scraping

requests - use for downloading html code.

BeautifulSoup - use for parsing html code.

Regular Expressiong (re) - use for search and matching.

time - using its time.sleep() function to slower the scraping (hopefully less burder to target website server).

random - to make sleep time in random seconds.

datetime - to re-format date downloaded from articles.

unicodedata - to clean up some unicode in article's content.

In [3]:
import requests
from bs4 import BeautifulSoup
import re
import time
import random
import datetime
import unicodedata  # to clean unicode eg. \xa0

Scrap Individual Article

Define scrapTheStar(link) - define function to scrap single page of The Star Online

Date, Title, Content, Link of the article will be added to all_articles

If intended to scrap news article from other site, this function will need to be re-write. As different website will have different html structure, storing data in different place.

In [4]:
# define function to scrap The Star Online (date, title, content, link)
def scrapTheStar(link):
    page_response = requests.get(link, timeout=5)
    page_content = BeautifulSoup(page_response.content, "html.parser")
    # scrap date
    date ='')
    date = date[0].text.strip()     # Tuesday, 9 Oct 2018
    date = re.findall(r'\w+', date) 
    date = ' '.join(date[1:4])      # 9 Oct 2018
    date = datetime.datetime.strptime(date,'%d %b %Y').strftime('%Y-%m-%d')  # 20181009
    # scrap title
    titles ='h1')
    title = titles[0].text.strip()
    # scrap content
    nodes ='div.story p')
    content = ''
    for node in nodes:
        content = content + node.text
    content = unicodedata.normalize("NFKD",content)
    # scrap content (alternative method, if above method failed)
    if len(content) < 10:
            content ='div.story')
            content = content[1].text
        except IndexError:
            print('unable to download content')
    # gathering information into a list
    date_title_content = [date, title, content, link]
    # Note: Google result may have article which keyword not exist in content (only exist in related news title ).
    # only append those articles with keyword in content
    if keyword in date_title_content[2].lower():  
        print('keyword not in content')

Scrap URL from Google News

Define scrapit(googleNewsUrl) - define function to scrap news articles URL from Google News results

When running this function, scrapTheStar(link) function will be call.

In [5]:
# define function to scrap Google News, loop through all pages to get The Star Online url.
def scrapit(googleNewsUrl):
    res = requests.get(googleNewsUrl, timeout=5)
    soup = BeautifulSoup(res.content, "html.parser")
    links ='h3 a')

    for link in links:
        link = link.get('href')
        urlRegex = re.compile('*/&sa')  # define match https..../&sa
        link = urlRegex.findall(link)  # find match in link
        link = link[0][:-4]   # regex return a list so call index [0], [:-4]strip /&sa which only need for match
        # make random sleep to slow down the scraping
        r = random.randint(1, 5)
        #print('sleep', r, 'seconds')

Web Scraping in Action

Inserting keyword to googleNewsUrl, looping through number of page specified, and calling the scrapit(googleNewsUrl) function, which will also call scrapTheStar(link) inside of it.

Optional: print out URL of Google News and individual articles for checking. Optional: print out sleep in seconds.

In [6]:
for page in range(pages):
    keyword_in_link = '+'.join(keyword.split())  # add + between keyword
    googleNewsUrl = '' + keyword_in_link +'' + str(page) + '0&sa=N'
keyword not in content

Save all_articles as JSON

After saving a copy in hard drive, we can use it for Text Analysis later.

In [7]:
import json
stringOfJsonData = json.dumps(all_articles)
jsonFile = open('news.json', 'w')

Result of Web Scraping tool.

Open up news.json to check Web Scraping result.

Example: print out all title from result.

Results store in : all_articles = [[date, title, content, link],[date, ..., ..., ...],....,....]

In [8]:
# Open JSON file
with open('news.json') as f:
    all_articles = json.load(f)

# Print out all title, with index in front
index = 1
for i in range(len(all_articles)):
    print(str(index) + ') ' + all_articles[i][1])
    index += 1
1) Petronas special dividend should be a special payout
2) Moody’s leaves Petronas ratings unchanged but lowers outlook to negative
3) Petronas festive video warms the hearts of M’sians
4) Petronas can absorb negative cashflow of RM40b for 2 years, says S&P
5) Petronas to the rescue
6) Petronas set to buy 10% stake in Oman gas field
7) Petronas is top choice at Putra Brand Awards
8) Petronas calls for collective action in gas advocacy towards a sustainable LNG industry
9) Yinson appoints Abdullah as director
10) Petronas buys 10% stake in Oman's Al Khazzan field
11) Petronas completes its first LNG supply to world's largest LNG bunker vessel
12) CIMB Research retains price target for Petronas Gas at RM18.10
13) Kelantan to withdraw petroleum royalty suit against Petronas, federal govt
14) Moody's affirms MISC issuer rating, outlook stable
15) Entrepreneurs urged to complement each other
16) 5% tax on petroleum products will affect market, warns Azmin
17) Malaysians argue with Indonesians over who owns the Twin Towers
18) Malaysia's Kimanis crude supplies to drop in Dec
19) KLCI, key Asian markets close lower on US rate hike fears
20) Ahmad Nizam is new chairman of Petronas
21) Petronas opens new R&T centre for Latin America
22) Petronas ups dividend payment to the govt to RM24b
23) Solar energy generator for 15 orang asli households
24) Petronas launches new vendor devt scheme with 18 partners
25) Report: Former premier Abdullah expected to be dropped as Petronas advisor
26) Petronas expected to produce oil from D28 offshore Bintulu field next month
27) Dr M: Petronas cannot be 'killed' to pay oil royalties
28) Petronas-Saudi RAPID refinery offloads first oil cargo
29) Canadian natural gas project, with Petronas in it, bucks trend with bold deision
30) Budget 2019 – the good, the bad and the ugly
31) Pakatan’s maiden budget a surprise for all
32) Petronas-Aramco JV seeks US$9.7bil for Rapid project
33) Federal Court turns down Petronas bid to challenge Sarawak
34) Petronas stocks, Sime Plantation underpin KLCI’s early advance
35) Saudi Aramco, Petronas tap banks for jumbo financing
36) Higher budget deficit to impact ringgit in short term,  Franklin Templeton says
37) The Petronas-Sarawak oil intrigue
38) Listing of Petronas among options to increase revenue
39) Petronas is said near investment in US$31bil Canada project
40) Petronas continues to inspire Malaysians
41) Asian markets climb but another disappointing day at Bursa
42) Deleum units secure jobs from Petronas and Shell
43) KLCI stages relief rebound after heavy losses
44) South Sudan extends Malaysian Petronas and others' oil contracts
45) Sarawak govt to defend state's right following suit by Petronas
46) Mohd Sidek steps down as Petronas chairman
47) Petronas urges O&G players to react carefully to unpredictable business landscape
48) Maybank, Petronas stocks push KLCI to higher close
49) KLCI closes at day's low as Tenaga, Digi slump
50) Malaysia's Petronas-Saudi RAPID refinery to receive first oil cargo by end-Sept
51) Petronas Q1 net profit up 26% to RM13b on higher revenue, net write-back on impairment
52) BN to Govt: Don’t sell stake in Petronas
53) Petronas aims to maintain discipline
54) Samsung 11.11 deals include 65in UHD TV and Gear S3 Frontier smartwatch
55) KLCI slumps in volatile trade while MyEG, Datasonic in focus
56) Petronas-Saudi RAPID refinery to offload first oil cargo on Monday
57) KLCI crosses key 1,800 as Petronas stocks advance
58) Bursa ends in the red as foreign selling picks up pace
59) Report: Pak Lah to take leave from role as Petronas adviser
60) KKB secures wellhead platform contract from Petronas Carigali
61) Moody’s affirms Petronas A1 ratings with stable outlook
62) Petronas Chemicals to venture into specialty chemicals
63) 43 from Terengganu score prestigious scholarship from Petronas
64) Selling off part of Govt’s Petronas stake a bad idea, says BN
65) IHH gives KLCI slight boost as Maybank weighs
66) Telcos, Tenaga, banks push KLCI to higher close
67) Sarawak govt disappointed over postponement of Petronas' court case
68) Will Hassan leave SembMarine for Petronas?
69) Petronas warned of July 1 being the cut-off date
70) 43 from Terengganu score prestigious scholarship from Petronas
71) Selling off part of Govt’s Petronas stake a bad idea, says BN
72) Sarawak govt disappointed over postponement of Petronas' court case
73) Will Hassan leave SembMarine for Petronas?
74) Petronas warned of July 1 being the cut-off date
75) Genting M'sia drags KLCI lower, oil prices fall on US sanctions waiver
76) Omar Mustapha resigns from all positions in Petronas
77) Petronas: No spike in upstream capex
78) Najib: Don’t plunder funds from Petronas for bailouts
79) Consumer stocks top gainers, Digi lifts KLCI
80) Kelantan PAS Youth challenges Guan Eng, tells him not to waste time politicking
81) Another weaker closing for Bursa as Public Bank declines
82) Musa Aman wants Sabah AG to intervene in Petronas-Sarawak case
83) Chance to study abroad
84) AmInvest Research maintain Sell on Petronas Gas, fair value RM16.80
85) IHH gives KLCI slight boost as Maybank weighs
86) 'Mysterious' land-clearing along Federal Highway in Shah Alam is for LRT3
87) Petronas Chemical Q1 earnings at RM1.065b, stronger ringgit weighs
88) Sabah granted observer status in Petronas vs Sarawak case
89) Near-term uncertainties remain in oil and gas sector
90) KLCI slides 14 points, tracks Wall St, Asian markets decline
91) Petronas mulls listing lubricants business
92) Chosen as the best in their fields
93) KLCI bucks trend to rise at midday
94) Petronas and Saudi Aramco set up two Rapid joint ventures
95) Late fund buying shores up KLCI, tech stocks routed
96) KLCI inches up amid cautious key Asian markets
97) Bursa moves ahead on GDP results, Petronas stocks lift
98) Blue chips extend gains, Petronas stocks lead KLCI higher
99) Petronas, Saudi Aramco launch corporate identity of Pengerang JVs
100) Petronas maintains it has exclusive oil ownership
101) Tenaga, Maybank power KLCI slightly higher
102) Govt ready to review royalty
103) Govt to continue with its expansionary budget
104) Petronas earnings up 26% in first quarter
105) Students from four universities benefit from Prestige programme
106) Petronas appoints Ainul Azhar to its board
107) Crew member missing after Petronas vessel catches fire
108) Boustead reaches settlement with Petronas Carigali, others
109) Petronas to sell Prince Court Medical Centre to Khazanah
110) Petronas JV LNG Canada progressing with local support
111) CIMB Research upgrades Petronas Dagangan to Add
112) PetChem eyes acquisitions to boost specialty business
113) South Korea's SK Group keen on JV with Petronas
114) Public Bank, Petronas Gas lead KLCI  higher early Thursday
115) Digi teams up with PetDag for service centres in petrol stations
116) Possible delays in awarding of Sarawak-based PSCs
117) ‘Sarawak govt must do all it can to win against Petronas’
118) Petronas inks LNG supply deal with India's  Hiranandani
119) Petronas hits oil in West Africa
120) Bursa ends October on strong note, KLCI above 1,700
121) Project manager wins top Petronas prize out of two million entries
122) Shell close to easing out Petronas to clinch Hong Kong's LNG import deal
123) Petronas 'Everybody Wins' campaign overwhelmed by response, ends early
124) New sales tax may be good but at what cost
125) PETRONAS sets up its own MotoGP outfit
126) Petronas Gas leads KLCI higher at midday
127) Cautious start to the week as Public Bank, Petronas Gas dip
128) Barakah shares jump 16% after Petronas Gas extends its contract
129) Petronas Gas earnings higher in Q1
130) Petronas paints a sombre picture of the oil and gas industry
131) ‘We deserve more than 5% oil royalty’
132) Petronas' FY17 results underpinned by transformation drive
133) Petronas: We have three-pronged strategy
134) Residents want safety assurance on pipeline
135) Petronas wins maiden contract to supply LNG to India
136) Calls for fire safety outreach programme
137) Stronger close for KLCI but trading volume shrinks
138) Petronas Dagangan posts record-breaking FY17 earnings of RM1.59b
139) PetDag and Grab jointly introduce Grab driver pit stop
140) KLCI rises on post-US election rally
141) Petronas Gas remains a hold at CIMB Research
142) Industry players must emulate Petronas
143) Petronas Chemicals posts 42% earnings growth in FY2017
144) Lawyer: Sarawak can enforce laws for oil and gas activities
145) Oil and gas company takes humorous approach in its festive ad campaign
146) Oil and Gas stocks not gaining despite earnings visibility
147) Petronas inks deal to supply LNG for up to 13 years
148) Handal Resources gets umbrella contract from Petronas Carigali
149) Charity run and ride returns
150) Petronas sells Bertam crude for March at premium
151) Perk up Hari Raya drive with coffee
152) Najib: Why sideline Caltex and BHP?
153) Foreign funds turn net buyers of RM322.7m
154) Transparency required in Petronas accounts
155) KLCI picks up heading into midday; TNB, Maybank lift
156) Cautious start to November for Bursa
157) Perunding Ranhill Worley secures contract from Petronas JV KPOC
158) Petronas sees stronger 2017 results as earnings surge in Q3
159) Petronas: LNG cargo deliveries not affected by pipeline leakage
161) Oil and gas still subject to volatility, players ‘tread cautiously’
162) Economic Report 2019: Fiscal deficit at 3.4% vs 3.7% this year
163) Tussle for O&G resources
164) Fuel juggernaut coming as Saudi backed Petronas refinery starts
165) Petronas Dagangan offers giveaways worth RM8,888
166) Take a break from driving to enjoy free cuppa
167) KKB Engineering subsidiary gets Petronas Carigali contract
168) Petronas acquires 30% of Senegal's Rufisque Offshore Profond block
169) Wong: Give Sarawak back what it is due under MA63
170) Restaurant operator collaborates with petroleum company to widen reach
171) Malaysia's Petronas buys Australian spot LNG after domestic outage
172) Petronas offers new Bergading condensate for March
173) T7 Global unit Tanjung Offshore bags Petronas Carigali job
174) Bursa rallies in broad-based advance, Asian markets jump
175) Students tell patriotic stories via murals
176) Cola brand presentsits can designs from yesteryear
177) Bumi Armada, Petronas Dagangan on UOB Kay Hian Research buy list
178) Petronas giving away prizes worth RM72mil in appreciation campaign until Feb 28
179) Petronas Gas weighs on KLCI, ringgit eases against dollar
180) O&G industry to do better due to stable crude oil prices, say analysts
181) Heated debate over oil royalties
182) Harapan govt must stop the blame game and deliver on promises, says Daim
183) RM8.4bil methanol plant to be built in Bintulu
184) Don’t drop austerity mindset
185) Petronas projects oil prices to remain at US$50 to US$60 levels
186) Sapura Energy, partners to develop Sarawak gas fields
187) Malaysia resumes exports of Bentara crude after output rises
188) Adam gets rave reviews despite returning empty-handed
189) Malaysia LNG exports hit four year low on pipeline issues
190) Petronas JV secures RM5.1b to finance Pengerang terminal
191) Another milestone for local oil company
192) Winners and losers from higher oil prices by CIMB Research
193) Pengerang Integrated Complex on track for overall start up in Q1 next year
194) Mixed views on Petronas capex allocation
195) Asian markets slide on hawkish US Fed minutes, Bursa awaits guidance on economic policy
196) Debater advises scholarship applicants to stay calm
197) Petronas renews Wan Zulkiflee’s contract
198) Petronas Carigali official held
199) Uzma secures umbrella contracts from Petronas Carigali
200) Higher dividend from Petronas?
201) Maybank Research has Buy calls on Petronas Chemicals, Lotte Chemical
202) Carimin secures MCM contract from Petronas Carigali
203) Global landmarks go dark as Earth Hour climate campaign kicks off
204) Sapura Energy JV moves ahead to develop gas fields off Sarawak
205) Aramco plans to ship first crude oil to Malaysia JV refinery in Oct
206) Petronas 2.0 - tougher on costs and more downstream
207) Petronas raises dividend payment to government
208) Petronas to open 100 more 'grab to go' stores at its stations
209) Touching webfilm wraps up Petronas’ 2017 festive campaigns
210) PetChem monitors potential impact of US-China trade war
211) Petronas’ 2016 profit margins improve despite tough environment
212) What’s in it for Petronas?
213) Syed Saddiq hopes Malaysian football can mature from defeat
214) Enra subsidiary bags RM206mil Petronas job off Myanmar
215) PetGas Q2 net profit grows 20% on contribution from new Pengerang terminal
216) PKR deputy president gets wider powers
217) Petronas Dagangan steps up green plan with 100 EV charging stations
218) TPA may have Neutral impact on PetGas
219) Petronas able to reduce costs with Blue Ocean Strategy
220) Petronas plans RM644mil investments in India
221) Petronas expects 1,000 redundancies under group-wide revamp
222) Petronas’ Sarawak O & G fields generate an average of 850,000 bpd
223) Brazil lets Petronas Carigali, 10 others to bid for presalt areas
224) Rising crude oil prices underpin KLCI's early Thursday gains
225) Petronas delivers first LNG cargo to Thailand under 15-year deal
226) KLCI climbs early Wednesday, Petronas Dagangan lifts
227) Petronas likely to take a hit, may make provisions
228) Petronas Carigali to terminate helicopter charter contract
229) Abang Johari: Sarawak govt to assert right to regulate oil and gas industry
230) Petronas Gas to gain from new leasing deals
231) South Sudan targets oilfield security, including Petronas'
232) Petronas counters lead fall on KLCI, Asian markets slip
233) Petronas' foiled Canadian dream
234) Walking away with big prizes
235) Energas enables Petronas to tap sanctioned nations
236) Petronas Dagangan finds new buyer for LPG unit in Vietnam
237) Three in Petronas Carigali graft case remanded
238) End to Petronas-Sarawak controversy
239) Petronas sells 10% stake in Bintulu LNG train for RM2.14bil
240) Petronas Lubricants CEO not ruling out IPO in growth drive
241) Petronas clears the air on partnerships
242) Capturing 43 years of achievements
243) Petronas stations free to set lower petrol prices
244) Petronas' Canadian unit to look at other LNG opportunities
245) Set up kiosks in rural areas
246) Petronas to terminate RM3bil contract given to Boustead subsidiary
247) Oil majors eye Petronas’ stake
248) Petronas tougher on costs, focuses more on downstream
249) PetGas on track for a profit rebound this year
250) Challenges for Petronas
251) Long deadline for Saudi Aramco deal with Petronas
252) Petronas Coral 2.0 results in cost savings of RM5bil
253) Malaysia-Saudi Aramco venture seeks commitments for US$9.7bil project finance
254) Dayang bags MCM services contract from Petronas Carigali
255) Petronas says RAPID project remains on track after Aramco's snub
256) Bursa weighed by TNB, PetChem
257) What is Aramco’s capital outlay for Pengerang projects?
258) Maybank Research retains buy on Dialog Group
259) Wee looks forward to his break
260) Fluor awarded contract for Petronas isononanol chemical plant
261) Petronas Dagangan expects good performance this year
262) Petronas decision a blow to Canada’s biggest shale play
263) Petronas’ first floating LNG facility to start operating next month
264) Cost-cutting measures help lift Petronas earnings
265) Petronas secures another block in Mexico
266) JERA's new LNG contract with Petronas foretells smaller, shorter deals
267) CIMB Research starts coverage on Petronas Dagangan with Hold rating
268) Petronas scraps $29bil western Canada LNG project
269) Petronas to bid for jobs in Iran
270) Analyst reports
271) Oil majors eye Petronas’ stake