This is a reddit flair detection repository developed using flask and python, it's live at https://flair-it-up.herokuapp.com/.
The directory contains web sub directories and a sub directory for hosting model and other scripts:
-
app.pyThe file which contains all the main backend operations of the website and used to run the flask server locally.
-
Procfile for setting up heroku.
-
model contains the saved model.
-
requirement.txt contains all the dependencies.
-
templates contains the html file.
-
static contains the css file.
-
nltk.txt contains the nltk dependency.
-
Scripts the directory contains scripts for data extraction, model, expolatory data analysis and experiment log manager notebooks.
The entire code has been developed using Python programming language and is hosted on Heroku. The analysis and model is developed using nltk library and various machine learning models, The website is developed using Flask.
- Open the
Terminal
. - Clone the repository by entering
https://github.com/abhishek-parashar/Reddit-flair-detection
. - Ensure that
Python3
andpip
are installed on the system. - Create a
virtualenv
by executing the following command:virtualenv venv
. - Activate the
venv
virtual environment by executing the follwing command:source venv/bin/activate
. - Enter the cloned repository directory and execute
pip install -r requirements.txt
. - Now, execute the following command:
flask run
and it will point to thelocalhost
server with the port5000
. - Enter the
IP Address: http://localhost:5000
on a web browser and use the application.
The following dependencies can be found in requirements.txt:
I went through a lot of litrature and Youtube videos for the following task, the resources can be seen in the resorces section. After going through these resources and tutorials. I collected data from reddit using Praw module. I used nltk to remove bad words and applied various machine learning models to it. Only top 10 comments were taken, I used TFID DICT VECTORIZER to convert to word embeddings. Finaly deployed it using Flask and Heroku.
Machine Learning Algorithm | Test Accuracy |
---|---|
Linear SVM | 0.7418032786885246 |
Logistic Regression | 0.75409836 |
Random Forest | 0.7336065573770492 |
MLP | 0.5327868852459017 |
XGBoost | 0.7008196721311475 |
Machine Learning Algorithm | Test Accuracy |
---|---|
Linear SVM | 0.3442622950819672 |
Logistic Regression | 0.3237704918032787 |
Random Forest | 0.3770491803 |
MLP | 0.2663934426229508 |
XGBoost | 0.3688524590163934 |
Machine Learning Algorithm | Test Accuracy |
---|---|
Linear SVM | 0.2745901639344262 |
Logistic Regression | 0.3073770491 |
Random Forest | 0.2622950819672131 |
MLP | 0.2254098360655737 |
XGBoost | 0.2172131147540983 |
Machine Learning Algorithm | Test Accuracy |
---|---|
Linear SVM | 0.430327868852459 |
Logistic Regression | 0.4344262295081967 |
Random Forest | 0.438524590163 |
MLP | 0.3073770491803279 |
XGBoost | 0.4180327868852459 |
Machine Learning Algorithm | Test Accuracy |
---|---|
Linear SVM | 0.7090163934426229 |
Logistic Regression | 0.7131147540983607 |
Random Forest | 0.7745901639344263 |
MLP | 0.5532786885245902 |
XGBoost | 0.8278688 |
There various iterferences as discussed in the EDA notebook from the results we can infer that combined features give the best result probably because of the larger word embeddings present. We can also infer that the title as a feature also provides better results this can be attributed to the fact that title mainly consists of the required words or embeddings that is, it is in line with the flairs.
Since I am not that well versed in html, css I got the HTML and CSS sheets from here -https://github.com/krishnaik06/Deployment-flask
- https://towardsdatascience.com/scraping-reddit-data-1c0af3040768
- https://api.mongodb.com/python/current/tutorial.html
- https://medium.com/themlblog/splitting-csv-into-train-and-test-data-1407a063dd74
- https://towardsdatascience.com/multi-class-text-classification-model-comparison-and-selection-5eb066197568
- https://medium.com/@robert.salgado/multiclass-text-classification-from-start-to-finish-f616a8642538
- https://www.analyticsvidhya.com/blog/2018/04/a-comprehensive-guide-to-understand-and-implement-text-classification-in-python/
- https://www.districtdatalabs.com/text-analytics-with-yellowbrick
- Applied AI course- https://www.appliedaicourse.com/
- https://towardsdatascience.com/designing-a-machine-learning-model-and-deploying-it-using-flask-on-heroku-9558ce6bde7b
- https://towardsdatascience.com/deploying-a-deep-learning-model-on-heroku-using-flask-and-python-769431335f66
- https://medium.com/analytics-vidhya/deploy-machinelearning-model-with-flask-and-heroku-2721823bb653
- https://www.youtube.com/watch?v=UbCWoMf80PY
- https://www.youtube.com/watch?v=mrExsjcvF4o
- https://blog.cambridgespark.com/deploying-a-machine-learning-model-to-the-web-725688b851c7