dsprojects.png

This is a content page of my data science projects. Purpose is to present my skill in Data Science, Python and SQL. I am looking for career in data science. If I am the right candidate you are looking for, please contact me. You may also contact me to further understand what I have.

Email: yapsoonchung@yahoo.com

LinkedIn: https://www.linkedin.com/in/yapsoonchung

Website: https://scyap.github.io/DataScience

1) Exploratory Data Analysis (Credit Card Default)

This is my EDA on credit card default. Analyze by using Data Science OSEMN framework.

  • Obtain data: from Kaggle.
  • Scrub data / clean data: rename column and modify value.
  • Explore data: with Pandas DataFrame, Seaborn and Matplotlib visualization.
  • Modelling: Logistic Regression with Scikit-Learn.
  • iNterpret: summary of findings and suggestion.

Project link: https://www.kaggle.com/yapsoonchung/eda-on-default-of-credit-card-clients-dataset

eduDefault.png

2) ETL from CSV to MySQL (Credit Card Default dataset)

I build this ETL tool by using Python. This ETL tool goes through the process below.

  • Extract CSV file,
  • Transform some of the column name & value
  • Load data into MySQL database

Additional note:

  • Original data from Kaggle.
  • Database and Table created in MySQL before running the ETL.
  • Further analysis can be done by connecting MySQL to Redash visualization tool.

Project link: https://scyap.github.io/ETL_card_default/

etltool.png

3) SQL query and Redash visualization (Human Resource Analytics)

This is an analysis on Human Resource dataset. Accounts department have the highest dissatisfaction, and lead to high attrition rate. Result from analysis, number of project assigned per year, is affecting satisfaction level, and the attrition rate.

SQL query functions used include:

  • LEFT JOIN
  • WHERE
  • GROUP BY
  • ORDER BY

Stacked charts are visualized using Redash.

Project link: https://www.dropbox.com/s/q2fvdvqf89r4whq/HR%20dataset%20Analysis.pdf?dl=0

hrAnalytics.png

4) Web Scraping (News Article Analysis 1.0)

First step of my News Article Analysis, I build a web scraping tool with Python. It enables me to download more than 200 news article in minutes. Depends on topic that I'm interested, I can specify Keyword and Pages of result to scrap from Google News. Process goes as below.

  • search the specified Keyword (eg. Petronas),
  • scrap articles URL from Google News result page,
  • scrap article's Date, Title, Content (eg. 30 pages),
  • Loop through number of pages specified,
  • save as JSON format for further analysis (eg. NLP, Sentiment Analysis, Text Mining, K-means clustering)

Project link: https://scyap.github.io/NewsArticleAnalysis_WebScraping/

googleNews

5) Natural Language Processing (News Article Analysis 2.0)

As previously I have save 200+ articles into JSON file, now I can start my News Article Analysis with Natural Language Processing (NLP).

  • Load the JSON file which contains Date, Title, Content, and Link of articles.
  • Go through tokenization process and remove stopwords,
  • Optionally, may remove some non-meaningful words manually,
  • Then count Word Frequency.
  • Visualize by using Word Cloud.

Project link: https://scyap.github.io/NewsArticleAnalysis_NLP/

wordCloud

6) Sentiment Analysis (News Article Analysis 3.0)

This time, I’m applying Sentiment Analysis on News Article. Analysis goes through as below.

  • Prepare the sentiment dictionary.
  • Define a function that calculates sentiment.
  • Load the JSON file which contains Date, Title, Content, and Link of articles.
  • Calculate sentiment score for each individual article, and print out sample for review.
  • Load date and score into Pandas dataframe.
  • Visualize it in Matplotlib.
  • Result shows there are more positive articles.
  • However ratio of negative article is getting higher recently.

Project link: https://scyap.github.io/NewsArticleAnalysis_SentimentAnalysis/

newsSenti

7) Automation with Python (Weekly Report)

This is a Python program built to automated weekly summary report.

  • Selenium used for Chrome browser automation (login, select filter, and extract information).
  • OpenPyXL used for extract information from Excel.
  • Regular Expressions used for text search.

Project link: https://www.dropbox.com/s/cwh41yw26s1ao28/Automate%20Supplier%20Status%20Summary.pdf?dl=0

automation

bottom