Skip to content

A rest-API to serve word phrases and rating scores from text data using NLP.


Notifications You must be signed in to change notification settings


Repository files navigation

Tally AI

A Word-Trend Business Intelligence Dashboard That Provides Actionable Business Insights.

Business owners don’t have time to decode what people are saying about their business online - they just want to know what to improve - so our goal for Tally AI was to provide actionable suggestions to help businesses grow profit.

The app is currently piloting its functionality on hundreds of cafes and restaurants around the Phoenix, AZ area.

Tally is a one-stop snapshot for understanding your businesses' Yelp reviews.

 "Data analytics is not just for big corporations. 
 Your small business can stay on top of an ever changing marketplace 
 with the power of Tally."

Check the App Out


Data Science

Previous Product Manager | UX Designer

Elizabeth Ter Sahakyan Colton Mortensen Bobby Hall Tara Bramwell Shanthi Madheswaran
image image image image image

Previous Data Science (LABS 19)

Wenjing Liu Lily Su Rohan Kulkarni Anh Vu Blake Lobato
image image image image image

Project Overview

Tally AI is a word trend analysis application that takes online business reviews and provides clear insights on what people are saying and feeling. Business owners don’t have time to decode what people are saying about their business online - they just want to know what to improve - so our goal for Tally AI was to provide actionable insights to help these businesses grow profit.

At Tally, we believe that there is a network effect in helping small business - that when small businesses gain more revenue, everyone benefits.

We aimed at building a product that would provide the key insights out of any businesses’ reviews. You might be asking where we are getting data and how are we analyzing the customer reviews? The data came from a combination of the yelp dataset and scraped data from The yelp dataset was a sampling of each businesses’ review data. It contains over 190,000 businesses from select cities and over 6 million reviews. In order to get more comprehensive data, we chose to focus on a specific area Phoenix, AZ and scrape as many businesses available in our target market. We ran a pilot of our functionality on 942 cafes and restaurants around the Phoenix area. We ran a 10 hour scraping job across cafes and restaurants in the Phoenix, AZ area. To get this data, we used a paid proxy pool of 20,000 shared data center IP’s from a service named Luminati. Our code auto-rotates through IP’s dedicating a different IP address per page being scraped.

We just focused on the data of customer reviews, star ratings and the date of the review was left. For processing reviews, the data science team used Scattertext - for determining the correlation between word frequency and high and low ratings. Textrank - a graph-based ranking algorithm similar to google’s pagerank for SEO. We used textrank for keyword extraction, then found word occurrences over time.

Everything was built from the ground up for the purpose of learning, and here's how it works: After the user inputs a business name and location, we send a post request to the yelp api with the information and retrieve a list of results, which we populate in the react components.

Once a specific business is selected, data is retrieved from the data science API, and is displayed in our widgets on the dashboard. The web team used Redux to manage data on the front end and actions and reducers to fetch data from the Yelp and data science APIs. The visualizations was created using Rechart and the dashboard was built with React and Material UI. A custom script was written to allow users to rearrange dashboard modules.

The data science endpoint that powers the dashboard is uses the Django Rest Framework and is deployed on Elastic Beanstalk. The django app takes a business_id and checks if we already have the business in our database. If the business is new, we scrape the reviews in real time.

For registered users, the analysis data that we generated for businesses are stored in our database for future retrieval.

Our Django app runs scheduled jobs using the Advanced Python Scheduler to update the analysis data that we generate for businesses and store it in our database.

Web Application UI

Example conclusions from looking at the above dashboard by business owners

I might look into training my staff on customer service etiquette 
since people are complaining about the service.
I'm relieved that my half-price bottle service is getting 
buzz from the word trend chart.
Seeing a snapshot of trending phrases from my competitors 
have made me realize that I might think about introducing happy hour.

This is a Django app for data science micro service,
... ... ... locally running on Windows 10, deployed on AWS Elastic Beanstalk.

【Tally AI Front End】 for work with Front End UI Design

【Tally AI Back End】 for additional repos regarding authentification

【Tally AI Documentation】 for technical details on our project.

【AWS EB deployment logs】 for logs of our AWS Elastic Beanstalk Deployments

【All SQLs used in this project】 for useful SQL queries we used

【A D3.js line chart】 for exploratory data visualization work prior to migrating to Recharts

Product Canvas

Deployed Front End

Tech Stack & Architecture

React, Material UI, Recharts, Python, Django, Postgres, AWS

NLP Packages Used

Spacy, Textrank, Scattertext, Textrank

Data Sources

Release Canvas Presentation Slides 1-3

Web | Data Science Release Canvas Deliverables

Python Notebooks

Exploratory Data Analysis Yelp Dataset

NLP - BERT, word vectors, sentence vectors

Calculating Word Frequency Correlations with Ratings

NLP - Spacy Named Entity Recognition POS Tagging Exploration

Finding Context in Words Correlated with Highest and Lowest Ratings

Refactored Context in Words Correlated with Highest and Lowest Ratings

WordNet and Vader Sentiment Explorations

LDA Topic Modeling Explorations

How to Connect to the Data Science API

Web Scraped Endpoints Returns 10 positive and 10 negative word phrases associated with a business

viztype0: {
positive: [
         term: "cool cats",
         score: 0.08981400595659608
         term: "rescued cats",
         score: 0.08956279306536073
   negative: [
         term: "just bad business",
         score: 0.0442848147595502
         term: "a refund",
         score: 0.03511932390225489

Cumulative average of review star ratings for the past 8 weeks vs the average rating per week . timespan 8 weeks e.g. 8 weeks ago: 1,1,1,1,1, weekly_avg_rating=1, cumulative_avg_rating=1 7 weeks ago: 2,2,2,2,2, weekly_avg_rating=2, cumulative_avg_rating=1.5 6 weeks ago: 3,3,3,3,3, weekly_avg_rating=3, cumulative_avg_rating=2

     date: '2020-01-10’, 
     cumulative_avg_rating: 3, 
     weekly_avg_rating: 2
     date: 'Date 2', 
     cumulative_avg_rating: 4,
     weekly_avg_rating: 3

Endpoints Looking Through Yelp Dataset Returns “Trending” word phrases and their comparative fluctuations over segments of time.

       date: 'string with date',
       data: [ { phrase: "phrase 1", rank: 1}, 
               { phrase: "phrase 2", rank: 1}, 
               { phrase: "phrase 3", rank: 1} ]
       date: 'string with date',
       data: [ { phrase: "phrase 1", rank: 2}, 
               { phrase: "phrase 2", rank: 2}, 
               { phrase: "phrase 3", rank: 1.5} ]
       date: 'string with date',
       data: [ { phrase: "phrase 1", rank: 2}, 
               { phrase: "phrase 2", rank: 4}, 
               { phrase: "phrase 3", rank: 2} ]

Review frequency - shows change in number of reviews over time

[{"date": "2017-8-31", "reviews": 4}, {"date": "2017-12-31", "reviews": 2}, 
{"date": "2018-1-31", "reviews": 1}, {"date": "2018-2-28", "reviews": 2}, 
{"date": "2018-3-31", "reviews": 1}, {"date": "2018-4-30", "reviews": 4}, 
{"date": "2018-5-31", "reviews": 2}, {"date": "2018-6-30", "reviews": 1}, 
{"date": "2018-7-31", "reviews": 3}, {"date": "2018-8-31", "reviews": 1}, 
{"date": "2018-9-30", "reviews": 1}, {"date": "2018-11-30", "reviews": 1}]

【Testing URLs】
【Testing data documents】
【Testing script Colab】

Activate Virtual Enviroment

Miniconda3 or Anaconda3 Python 3.7 【Logs】
(If you are using Python 3.6 or manage your enviroments in some other way, skip this step.)

$ conda create -n python3.6 python=3.6
$ pip install pipenv
$ conda activate python3.6

(base) PS D:\github\django-tally>

$ pipenv install
$ pipenv shell

Install dependencies:
(If you have downloaded the repo, you can skip this step.)

$ pipenv install django psycopg2-binary djangorestframework pyyaml lxml "spacy>=2.0.0,<3.0.0" pytextrank "apscheduler>=3.6.3" django-apscheduler gensim sklearn

Generate requirements.txt

$ pip freeze > requirements.txt

Or $ pip freeze | Out-File -Encoding UTF8 requirements.txt
In the requirements.txt file, remove entries for spacy and en_core_web_sm, and add the following lines.


Frequently Used Django Commands

$ python runserver
$ python makemigrations  
$ python migrate  
$ python test --keepdb
$ python inspectdb >
$ python collectstatic
$ python -m django --version

Deploy to AWS Elastic Beanstalk

During the deployment, you may need to use the following AWS CLI commands.

$ eb init -p python-3.6 django-tally
$ eb create django-tally
$ eb status
$ eb deploy
$ eb open
$ eb logs
$ eb config
$ eb terminate django-tally
$ aws elasticbeanstalk restart-app-server --environment-name django-tally


(base) PS C:\Users\guido> aws2 --version
aws-cli/2.0.0dev3 Python/3.7.5 Windows/10 botocore/2.0.0dev2
(base) PS C:\Users\guido> python --version
Python 3.7.4
(base) PS C:\Users\guido> aws --version
File association not found for extension .py
aws-cli/1.17.5 Python/3.7.4 Windows/10 botocore/1.13.50
(base) PS C:\Users\guido> aws2 --version
aws-cli/2.0.0dev3 Python/3.7.5 Windows/10 botocore/2.0.0dev2
(base) PS C:\Users\guido> eb --version
EB CLI 3.17.0 (Python 3.7.4)
(django-tally-QTYVOJb0) (python3.6) D:\github\django-tally>python collectstatic
163 static files copied to 'D:\github\django-tally\static'.

【AWS Elastic Beanstalk Configuration】
All Applications -> django-tally -> Configuration -> Software -> Change:
Set WSGIPath = tally/
Set system environment variables here too

Testing URLs
Below links are for【tesing】. (by business alias) (by business ID) (Butters Pancakes & Café) (Jarrod's Coffee, Tea & Gallery)
You should get trendy phrases such as "beautiful art", "art gallery", "downtown mesa", etc. (view job logs by business ID) The links below are 【examples】.
You should get monthly rating counts like below.

[{"date": "2017-8-31", "reviews": 4}, {"date": "2017-12-31", "reviews": 2}, 
{"date": "2018-1-31", "reviews": 1}, {"date": "2018-2-28", "reviews": 2}, 
{"date": "2018-3-31", "reviews": 1}, {"date": "2018-4-30", "reviews": 4}, 
{"date": "2018-5-31", "reviews": 2}, {"date": "2018-6-30", "reviews": 1}, 
{"date": "2018-7-31", "reviews": 3}, {"date": "2018-8-31", "reviews": 1}, 
{"date": "2018-9-30", "reviews": 1}, {"date": "2018-11-30", "reviews": 1}] (create) (get, put, delete) (APScheduler background job)

【Testing URLs】
【Testing data documents】
【Testing script Colab】

Create A Project


$ cd C:\Users\guido\.virtualenvs\django-tally-QTYVOJb0\Scripts\
$ python startproject tally D:\github\django-tally

project name: tally
project created in directory: D:\github\django-tally

Run Django app

$ cd path/to/django-tally
$ python runserver


Watching for file changes with StatReloader
Performing system checks...

System check identified no issues (0 silenced).

You have 17 unapplied migration(s). Your project may not work properly until you apply the migrations for app(s): admin, auth, contenttypes, sessions.
Run 'python migrate' to apply them.
January 07, 2020 - 01:05:29
Django version 3.0.2, using settings 'tally.settings'
Starting development server at
Quit the server with CTRL-BREAK.
[07/Jan/2020 01:05:55] "GET / HTTP/1.1" 200 16351
[07/Jan/2020 01:05:55] "GET /static/admin/css/fonts.css HTTP/1.1" 200 423
[07/Jan/2020 01:05:55] "GET /static/admin/fonts/Roboto-Light-webfont.woff HTTP/1.1" 200 85692
[07/Jan/2020 01:05:55] "GET /static/admin/fonts/Roboto-Bold-webfont.woff HTTP/1.1" 200 86184
[07/Jan/2020 01:05:55] "GET /static/admin/fonts/Roboto-Regular-webfont.woff HTTP/1.1" 200 85876


(If you have download the repo, you can skip this step.)

# Internationalization
TIME_ZONE = 'US/Central' # 'UTC'
USE_I18N = True
USE_L10N = True
USE_TZ = True

Database configuration

In the tally/ file, edit the database connection configuration.
(If you have download the repo, you can skip this step.)

# Database 
import os
if 'RDS_HOSTNAME' in os.environ:
        'default': {
            'ENGINE': 'django.db.backends.postgresql_psycopg2',
            'NAME': os.environ['RDS_DB_NAME'],
            'USER': os.environ['RDS_USERNAME'],
            'PASSWORD': os.environ['RDS_PASSWORD'],
            'HOST': os.environ['RDS_HOSTNAME'],
            'PORT': os.environ['RDS_PORT'],
            'OPTIONS': {
            	'options': '-c search_path=django'
            'TEST': {
                'ENGINE': 'django.db.backends.sqlite3',

【Local Environment】
Add system environment variables in the Python virtual environment (NO quotation marks).
You can add a .env file in the django-tally folder, then add the following lines to the file (replace * with your credentials). Every time when you start the virtual environment, those variables will be set automatically. (Please make sure that in the .gitignore file .env has been added, or you are exposing the credentials to the Internet.)


Or you can manually add it every time after you start the virtual environment.
For Windows Powershell, use set VARNAME=value.
For MacOS/Linux use export VARNAME=value.

(django-tally-QTYVOJb0) (base) D:\github\django-tally>set RDS_DB_NAME=*
(django-tally-QTYVOJb0) (base) D:\github\django-tally>set RDS_USERNAME=*
(django-tally-QTYVOJb0) (base) D:\github\django-tally>set RDS_PASSWORD=*
(django-tally-QTYVOJb0) (base) D:\github\django-tally>set RDS_HOSTNAME=*.*
(django-tally-QTYVOJb0) (base) D:\github\django-tally>set RDS_PORT=*

To make sure the variables are properly created, type python then print out os.environ[<varname>].

(django-tally-QTYVOJb0) (base) D:\github\django-tally>python
Python 3.7.4 (default, Aug  9 2019, 18:34:13) [MSC v.1915 64 bit (AMD64)] :: Anaconda, Inc. on win32
This Python interpreter is in a conda environment, but the environment has
not been activated.  Libraries may fail to load.  To activate this environment
please see
Type "help", "copyright", "credits" or "license" for more information.
>>> import os
>>> os.environ['RDS_DB_NAME']

To configure the instance deployed on AWS Elastic Beanstalk.
Go to the application Configuration page, choose Software.

Add system environment variables there.


If you have downloaded this repo, you can skip this step.

$ cd path/to/django-tally
$ python migrate


Operations to perform:
  Apply all migrations: admin, auth, contenttypes, sessions
Running migrations:
  Applying contenttypes.0001_initial... OK
  Applying auth.0001_initial... OK
  Applying admin.0001_initial... OK
  Applying admin.0002_logentry_remove_auto_add... OK
  Applying admin.0003_logentry_add_action_flag_choices... OK
  Applying contenttypes.0002_remove_content_type_name... OK
  Applying auth.0002_alter_permission_name_max_length... OK
  Applying auth.0003_alter_user_email_max_length... OK
  Applying auth.0004_alter_user_username_opts... OK
  Applying auth.0005_alter_user_last_login_null... OK
  Applying auth.0006_require_contenttypes_0002... OK
  Applying auth.0007_alter_validators_add_error_messages... OK
  Applying auth.0008_alter_user_username_max_length... OK
  Applying auth.0009_alter_user_last_name_max_length... OK
  Applying auth.0010_alter_group_name_max_length... OK
  Applying auth.0011_update_proxy_permissions... OK
  Applying sessions.0001_initial... OK

Django migration will create tables automatically in the database.

Create Django Admin User

$ cd path/to/django-tally
$ python createsuperuser


Username (leave blank to use 'guido'): ***
Email address:
Password (again):
This password is too short. It must contain at least 8 characters.
This password is too common.
This password is entirely numeric.
Bypass password validation and create user anyway? [y/N]: n
Password (again):
Superuser created successfully.

Use Django REST Framework for APIs

(If you have downloaded the repo, you can skip this step.)

PS D:\github\django-tally>

# D:\github\django-tally\tally\
# Application definition
    'rest_framework',             # Add this line; other app names are not allowed
    'example',                   # Add this line; you can use app names other than "example" 
    'yelp',                       # Add this app as well for this project

Create an app called "example".

$ python startapp example

Setting up URL patterns
E.g. regular expression match UUID as primary key (?P<pk>[0-9a-f-]+):

urlpatterns = {
        YelpYelpScrapingCreateView.as_view(), name="create"),
        YelpYelpScrapingDetailsView.as_view(), name="details"),

E.g. query strings

urlpatterns = {path('<slug:business_id>', home, name='home')}
def home(request, business_id):
    viztype = request.GET.get('viztype')
    if viztype == '1':
        result = json.dumps(yelpTrendyPhrases(business_id))
    elif viztype == '2':
        result = json.dumps(yelpReviewCountMonthly(business_id))
        result = json.dumps(getDataViztype0(business_id))
    return HttpResponse(result)

Follow this tutorial to build a REST API.

Django Auto-Generate Data Models from Database Tables

$ python inspectdb >

After running this command, modify class names in the file.
Add to every class name. E.g.
For app "example", change class Bucketlist -> class ExampleBucketlist
For app "yelp", change class Business -> class YelpBusiness
Follow the instructions in the file, make sure model definitions are correct.
Then move the file to the corresponding app folder.
So every app would have their own models without conflicting with other apps.
This is an example of the Django data models created.
You can query with or without Django data models. E.g.
Issue: Django “ValueError: source code string cannot contain null bytes”
Solution: You can simply create a new .py file, copy and paste the content to it, then replace the file with it.


spaCy models
How to install models
Download spaCy model manually (Not in use)

You can install spaCy models just like installing a Python package.
pipenv install
Then import the models in your code.

import en_core_web_sm
nlp = en_core_web_sm.load()   


import spacy
nlp = spacy.load("en_core_web_sm") 

【Deployment】 Make sure the following 2 lines are in the requirements.txt.


Make sure remove spacy==2.2.3and en_core_web_sm==2.25 from the file, or you will get an error when delpoying saying "Could not find a version that satisfies the requirement en-core-web-sm==2.2.5".
【Manually】 Put the following folder in the repo (same level with
spacy.load("en_core_web_sm/en_core_web_sm-2.2.5") with
CAUTION: You can do it this way, but deployment from Windows 10 to AWS Elastica Beanstalk might have UnicodeDecodeError when loading a model, while both launching server on Windows 10 locally or deployment from MacOS seem fine.

Background Job Scheduling

**Advanced Python Scheduler** * [APScheduler official document]( * [Django-apscheduler Github repo]( * [An important tutorial]( * [A simple example]( of setting up a background job by using `apscheduler.schedulers.background.BackgroundScheduler`. * [【My example code】](, [【Logs】]( ``` $ pipenv install apscheduler $ pipenv install django-apscheduler ```



When contributing to this repository, please first discuss the change you wish to make via issue, email, or any other method with the owners of this repository before making a change.

Please note we have a code of conduct. Please follow it in all your interactions with the project.

Issue/Bug Request

If you are having an issue with the existing project code, please submit a bug report under the following guidelines:

  • Check first to see if your issue has already been reported.
  • Check to see if the issue has recently been fixed by attempting to reproduce the issue using the latest master branch in the repository.
  • Create a live example of the problem.
  • Submit a detailed bug report including your environment & browser, steps to reproduce the issue, actual and expected outcomes, where you believe the issue is originating from, and any potential solutions you have considered.

Feature Requests

We would love to hear from you about new features which would improve this app and further the aims of our project. Please provide as much detail and information as possible to show us why you think your new feature should be implemented.

Pull Requests

If you have developed a patch, bug fix, or new feature that would improve this app, please submit a pull request. It is best to communicate your ideas with the developers first before investing a great deal of time into a pull request to ensure that it will mesh smoothly with the project.

Remember that this project is licensed under the MIT license, and by submitting a pull request, you agree that your work will be, too.

Pull Request Guidelines

  • Ensure any install or build dependencies are removed before the end of the layer when doing a build.
  • Update the with details of changes to the interface, including new plist variables, exposed ports, useful file locations and container parameters.
  • Ensure that your code conforms to our existing code conventions and test coverage.
  • Include the relevant issue number, if applicable.
  • You may merge the Pull Request in once you have the sign-off of two other developers, or if you do not have permission to do that, you may request the second reviewer to merge it for you.


These contribution guidelines have been adapted from this


See Project Documentation for technical details on our project.



A rest-API to serve word phrases and rating scores from text data using NLP.




Code of conduct





No releases published


No packages published