-
Notifications
You must be signed in to change notification settings - Fork 0
/
finalpresentation.Rpres
61 lines (38 loc) · 2.24 KB
/
finalpresentation.Rpres
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
Coursera Data Science Capstone Project
========================================================
author: Zvonko Kosic
Next Word Prediction [Application](https://zvonkok.shinyapps.io/CourseraDataScienceCapstone/)
Comprehensive and brief presentation of the Coursera Data Science
Capstone Project in cooperation with the John Hopkins University
and SwiftKey.
Introduction
========================================================
The objective of this project is to build a predictive text model and demonstrate
the build algorithm in form of a shiny application.
The data that was used for the model comes from a corpus called [HC Corpora](http://www.corpora.heliohost.org/).
A data science approach (data cleaning, exploratory analysis, ...) was used to
create the application.
Last but not least it should be mentioned that the application/algorithm/model
was entirely developed in the R language ecosystem.
Approach
========================================================
The developed application for word prediction is based on a classic **N-Gram** model.
The input for the **N-Gram** model is a cleaned (lowercase, remmoving punctuation, numbers, ....)
subset of the supplied data (blogs, twitter and news)
The sample was tokenized into N-Grams, namely unigram, bigram, trigram and
quadgrams sorted by frequency.
These new data structures were then used to predict the next word depending
on the input.
The Application (Simple Chat App)
========================================================
The application is a simple chat were the user can enter his message and
the app suggests, predicts the next three words beginning with the best match (1).
Additionally to the predicted words, the application uses autocompletion to
reduce typing as seen in (2).
![Application Screenshot](app.png)
Last Words
========================================================
The chat app is hosted on shinyapps.io: [https://zvonkok.shinyapps.io/CourseraDataScienceCapstone/](https://zvonkok.shinyapps.io/CourseraDataScienceCapstone/)
On the shinapps.io website is a brief description how the app works and how
to use the features.
The Rpubs presentation is located here: [http://rpubs.com/zvonkok/CourseraDataScienceCapstone](http://rpubs.com/zvonkok/CourseraDataScienceCapstone)