Sept 2017 - April 2018
Have you ever answered a poll on twitter? Where you saw a question, answered the question and then got to see the results?
That is what this app does. You see questions, answer them, and then get to see the results. You can also ask your own questions that others can answer.
TS Python Angular Ionic Django Scikit Learn NLTK Chart JS
Since we were collecting a bunch of information about our users you could imagine that we could do some pretty cool things with that data. Specifically finding interesting correlations.
For example, maybe you had one question that asked how often you exercised, and another question that asked how often you eat fast food. For all users that answered both you could maybe find something like the users who ate the most fast food exercised the least.
But the idea of having such a great set of data to look at is what kept us working so hard on this.
This app lets users create surveys and answer surveys that other users have created. The surveys are ranked by a model that was trained on tweets that were scraped from twitter.
The ability to log in with your facebook account.
NLP for question ranking
All of the questions that show up on your feed are ranked using nlp. You can read how we created the model below.
Randomly generated data
The app is filled up with mock data that is generated through nlp.You can read how we created the model below.
Once the project was finished we went through our code and tried to identify any code smells in order to refactor them.
In order to create a question ranking model and generate random data we needed some data. So we got our data from twitter using their api's. We looked for tweets that were asking questions. And we collected:
- users follower count
- users friend count
- users favorites count
- users statuses count
- time the tweet was created
- time of the query
- and whether the tweet contains any media
To create the model we take the 1000 most common words and creates an occurrence matrix for each tweet. Then we concat the other data we collected to the matrix. Number of retweets is used as the label. Then a linear regression is done to map all the data about a tweet to the number of retweets. Lastly the the coefficients corresponding to the unigrams are saved. Then we use this model to rank incoming questions.
Generating Random Questions
We wanted to fill up the app with mock data, so we generated some random sentences. To do this we save all of the bigrams found in the collection of tweets. Then the random sentences are generated by choosing a random first word and then making random choices in the possible words that could follow ( based off the bigram list ).
Have a question?
Have a question or want to see code in a private repo? Feel free to email me at the address below.