Week 5 Blog - λ Lovelace

This is a requested blog post by the module coordinators, it will cover:

The Final Recommender System design
The current prototype Recommender System
How the Recommender System will interact with the rest of the system

The Goal

The ultimate goal of this part of the system is to suggest more relevant tweets to the user. As of the time of writing, we have decided on two algorithms for the ultimate Recommender System: Case-based recommendation and the Bounded Greedy algorithm. This is not binding however, and may change in the future. Our preliminary algorithm is based on the bag-of-words approach, which is already functional.

Final Design Make-up

The finalized software produced from this project should be capable of providing recommendations that are more relevant than those currently returned by Twitter. It will use the Case-based recommendation format and the Bounded Greedy algorithm to provide personalized tweets to the user that are more relevant to their interests than the default tweets returned by Twitter.

Current Recommender System

The current Recommender system uses Python and scikit-learn to create a bag-of-words which are then used to rank tweets. In order to do this, scikit-learn is used with some dummy data to create a set of distinct words tweeted by the user. The “sorted()” method is used in combination with a bag-of-words function to sort the dummy tweets based on the “score” returned by the bag-of-words function. The bag-of-words function essentially counts up the number of occurences of a word in a tweet that is also present in a separate list of words. This separate list represents the users own tweets on their timeline, the implication being that the more similar that the contents of a a tweet are to the bag-of-words, the more similar the tweet is to the bag-of-words. Hence, the more similar the tweet is to the users timeline, which means that it should be a good recommendation.

This is a poor form of recommendation for the following reasons:

It does not weigh different words more heavily than others. For example, the word “the” has the same significance as the more specific word “programming”.
The semantics behind a tweet are lost. There is no way to differentiate between the serpent “python” and the programming language “Python”.
There is no weighting on various factors which could influence the quality of the recommendation.
It is, at best, a very crude method of finding similarities between texts.

tweet_two = 'tweet one two'
tweet_one = 'tweet one two three'
tweet_four = 'tweet'
tweet_three = 'tweet one'
list_of_tweets = [tweet_three, tweet_four, tweet_one, tweet_two]
users_tweets = ['tweet one', 'tweet two', 'tweet three']

recommender_object = Recommender(list_of_tweets)
recommended_tweets = recommender_object.generate(users_tweets, 3, None, None)

print(recommended_tweets)

def generate(self, user_tweets, number_of_recommendations, followed_accounts, how_many_days_ago):
        self.vectorizer.fit_transform(user_tweets)
        terms = self.vectorizer.get_feature_names()
        words = user_tweets
        results = sorted(self.followed_tweets, key=self.count_bag, reverse=True)
        
        return results

Other implementations for the Recommender System were considered, but found to not be relevant for this project (Opinion-mining and sentiment analysis, for example).

Interaction With The Rest Of The System

The Recommender System will sit in the back-end of the system on a web server. It will be used by a framework/service (E.g. Flask) that communicates with the iOS app and returns the results of the Recommender Systems findings. This is a simple approach that should work well for scaling purposes as it ties the Recommender System to a web service that easily allows for more endpoints to be added as they are needed.

On behalf of λ Lovelace
- Marc Laffan