Skip to content

Topic Modelling using SparkML’s LDA with on news articles collected from NewsAPI, covering over 30+ sources.

Notifications You must be signed in to change notification settings

supreetkt/Text-Analysis-using-newsAPI

Repository files navigation

RUNNING.txt

1] First run the data_acquisition.py script to fetch latest news headlines and metadata and store it to Cassandra table. Arguments to the program are keyspace and table. Currently database stored on vshukla keyspace. Use command: python3 data_acquisition.py vshukla newsdata

2] Run newsTopicModels.py which contains the main code for text pre-processing and topic modelling with LDA. It exports two tables in CSV format. Use command: spark-submit --packages anguenot:pyspark-cassandra:0.6.0 newsTopicModels.py

3] Run the post-processing script tablesToFeedToTableau.py to make the LDA exported csv files into Tableau data source friendly tables.

4] For Tableau visualisations refer the Tableau Public link https://public.tableau.com/views/TextAnalysisBidDataProject/START?:embed=y&:display_count=yes&publish=yes

GitLab code repo: https://csil-git1.cs.surrey.sfu.ca/ksanduja/BD-NewsAnalysts.git

About

Topic Modelling using SparkML’s LDA with on news articles collected from NewsAPI, covering over 30+ sources.

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages