Sentiment analysis of Ecuadorian political tweets from 2021 elections using Natural Language Processing

Authors: Sebastián Ayala, David Mena, Sebastián Lucero

21th December 2020

About this project

In this project, we captured sentiments of replies to tweets from two Ecuadorian presidential candidates in the 2021 elections (@LassoGuillermo and @ecuarauz). A model of neural networks was used to classify tweets considering their sentiment as positive, negative, or neutral. The training dataset was obtained from the Workshop on Semantic Analysis at SEPLN (TASS) of 2020, 2019, and 2012. Thus, we joined together data of the three editions in one dataset that has information about the id, text, and sentiment associated with tweets. Due to privacy politics we can't share the datasets here, but you can register in this page and yo will access all datasets.

Data extraction

We extracted all replies to tweets of both candidates between 01/12/2021 and 18/12/2021. For this task, we used the Python package Tweepy and the R package rtweet. We obtained better results with rtweet, so this software was chosen for this task. Then, a sample of 1000 tweets for each candidate was selected. Both scripts in Python and R are available in the Data_extraction folder. We can't share tweets information because of privacy politics of tweeter development account, but you can access to this information applyin for tweeter development account.

In addition, we obtained TASS datasets in xml and csv files. Then, we applied the script Merge_TASS_data.py to join all data in one file.

Preprocessing, Feature extraction, Model and Results (PFMR)

All these processes were done using Python. We applied two preprocessing steps, tokenization and stop words deletion, using Keras and nltk tools. For the feature extraction, we used a simple one hot encoding method to get the text representation into numerical data for the model, obtaining a corpus from all the words in the TASS dataset and candidate tweets. The dataset was divided in 70-15-15 proportion of training, validation, and test respectively. The model applied was neural network using Keras. Then, replies to tweets from presidential candidates were classified using our model. The script of this section was performed using Google Colaboratory servers, and the code is available in Jupyter notebook format as Ecuadorian_candidates_tweets_sentiment_an.ipynb and as Python script as Ecuadorian_candidates_tweets_sentiment_an.py.

Python libraries to run the script of PFMR section

pandas
nltk
keras
matplotlib
numpy
sklearn

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
Data_extraction		Data_extraction
Ec_candidates_tweets_sentiment_an.ipynb		Ec_candidates_tweets_sentiment_an.ipynb
Ec_candidates_tweets_sentiment_an.py		Ec_candidates_tweets_sentiment_an.py
LICENSE.md		LICENSE.md
Merge_TASS_data.py		Merge_TASS_data.py
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sentiment analysis of Ecuadorian political tweets from 2021 elections using Natural Language Processing

Authors: Sebastián Ayala, David Mena, Sebastián Lucero

21th December 2020

About this project

Data extraction

Preprocessing, Feature extraction, Model and Results (PFMR)

Python libraries to run the script of PFMR section

About

Releases

Packages

Languages

License

sayalaruano/Sentiment-analysis-of-Ecuadorian-political-tweets-from-2021election-with-Natural-Language-Processing

Folders and files

Latest commit

History

Repository files navigation

Sentiment analysis of Ecuadorian political tweets from 2021 elections using Natural Language Processing

Authors: Sebastián Ayala, David Mena, Sebastián Lucero

21th December 2020

About this project

Data extraction

Preprocessing, Feature extraction, Model and Results (PFMR)

Python libraries to run the script of PFMR section

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages