Quantifying Desires: The Art and Science of Recommender Algorithms

Let's go back to the days of the Sears catalog showing up at your door. Excitedly, you'd mark the gifts you wanted, trusting that Santa would somehow know exactly what you wished for. With cookies laid out in hopeful expectation, you looked forward to the delivery of presents on your big day. What's interesting here is how little we relied on data and numbers to figure out what you liked; your selections were mostly just your genuine preferences.

Fast-forward to the present, where Amazon, boasting a market capitalization greater than that of 90% of the countries sharing our massive planetary home, seems to predict your preferences with uncanny precision. In this post, I will delve into the various recommender techniques used by tech giants to anticipate your tastes.

The key to their success lies in the sophisticated use of recommender algorithms, enabling online retailers to precisely anticipate your preferences. Surprisingly, at its core, this capability boils down to some basic algebra, calculus, and statistics.

a close-up of a note — Photo by Laura Rivera / Unsplash

First some history

In 1979, Elaine Rich pioneered the development of the first recommender system, named Grundy, with the aim of assisting users in discovering books tailored to their preferences. Her innovative approach involved prompting users with specific questions, assigning them stereotypes based on their responses, and subsequently providing personalized book recommendations aligned with their identified stereotypes.

Since then, lots of changes and tweaks have happened with recommendation algorithms. Now, almost every big tech company uses its own recommendation-based machine learning thing. They've gotten pretty smart, digging into massive piles of data to give us better suggestions. It's become a big deal in how we interact with stuff online.

Picture you and a friend making the same dish. The result is delicious, but the cooking methods differ. Likewise, when predicting what you'd like, two primary methods emerge: Collaborative Filtering, which suggests things similar to what your friends enjoyed, and Content-Based Filtering, focused on understanding the ingredients of what you've liked before. Now, there's also a hybrid approach, like crafting a recipe by blending elements from different ones. It's similar to adding a pinch of this and a dash of that to create a customized experience just for you.

Collaborative Filtering

I hate to break it to you, but unfortunately, you're not as unique as you might think. Scientifically speaking, your DNA is roughly 99.9% the same as the rest of us, leaving you with only about 0.1% of uniqueness. Fortunately, collaborative filtering operates on the idea that people with similar preferences are likely to continue having similar preferences. This method heavily relies on data-driven variables such as user interactions, ratings, reviews, and purchase or watch history. While it's incredibly effective with substantial amounts of data, it faces challenges when dealing with new users due to the sparsity of data affecting the user-item matrix.

Let's run through a little example:

Three wonderful friends all decide to watch movies over the week, each friend similar to the other. They all watch 3 out of 4 movies, and give their corresponding rating - this would be an example of item-based collaborative filtering

Using Python, I have made the array where the rows are the friends and the columns are the movies watched - np.nan represents not watched movies.

# Import some packages and their dependencies
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity

# User-Item Matrix - Rows = friends, Columns = ratings 
ratings_matrix = np.array([
    [4, 5, 2, np.nan],
    [3, 2, np.nan, 5],
    [np.nan, 3, 5, 2]
])

First things first, let's talk about figuring out cosine similarity:

Imagine your data as coordinates on a map – each piece of info points you in a certain direction. Now, cosine similarity is like checking how much your direction matches with someone else's. If you're both heading in exactly the same way, the similarity score is high (near 1). But if you're going in different directions, the score drops (closer to 0). Basically, it's a handy tool to see if two sets of information are kind of pointing in the same general direction, no matter how long or short they are. Here is a little function determining the cosine simularity of the array

# Function calculating Cosine Similarity
def item_similarity(matrix):
    matrix = np.nan_to_num(matrix)
    similarity = cosine_similarity(matrix.T)
    np.fill_diagonal(similarity, 0)
    return similarity

Second off, lets write a function to predict rating of the movies not watched by each friend

The function calculates the dot product between the user_ratings matrix and the item_similarity matrix. This results in a weighted sum of ratings for each user-item pair, where the weights are determined by the similarity between items.

The sum of similarity values is calculated along the columns (axis=0) of the item_similarity matrix. This sum is used later in the division step.

The predicted ratings are calculated by dividing the weighted sum of ratings by the sum of similarity values. This step is the core of collaborative filtering, where the similarity between items influences the prediction of ratings for unrated items.

Finally, the function sets the predicted ratings for items that have already been rated by the user to NaN. This is done to ensure that the predicted ratings only apply to unrated items.

def predict_ratings(user_ratings, item_similarity):
    user_ratings = np.nan_to_num(user_ratings)
    weighted_sum = np.dot(user_ratings, item_similarity)
    sum_similarity = np.sum(item_similarity, axis=0)
    sum_similarity[sum_similarity == 0] = 1
    predicted_ratings = weighted_sum / sum_similarity
    predicted_ratings[user_ratings != 0] = np.nan
    return predicted_ratings

Now let's use these functions:

# Calculate item similarity
item_sim = item_similarity(ratings_matrix)
# Predict ratings for unrated items
predicted_ratings = predict_ratings(ratings_matrix, item_sim)
actual_ratings_matrix = np.nan_to_num(ratings_matrix)

Lastly, let's write a print function to display a nice array of data with the actual ratings and the predicted ratings of the friends.

print("Combined Ratings Matrix:")
for i in range(ratings_matrix.shape[0]):
    combined_row = []
    for j in range(ratings_matrix.shape[1]):
        actual_rating = int(ratings_matrix[i, j]) if not np.isnan(ratings_matrix[i, j]) else 0
        predicted_rating = predicted_ratings[i, j]
        combined_value = "{:.2f}".format(predicted_rating) if actual_rating == 0 else actual_rating
        combined_row.append("{:<12}".format(str(combined_value)))
    print(" ".join(combined_row))

Output:

Output cleaned up a little - The orange cells are the predicted ratings we have determined from the users.

Collaborative filtering proves to be a powerful tool for making data-driven decisions grounded in user preferences. While our example is modest, it highlights how friends might rate movies based on their peers' preferences. This approach provides platforms with valuable insights, fostering a more engaging user experience.

Content Based Filtering

Content-based filtering recommends items to a user based on the features or characteristics of items the user has interacted with or liked in the past. This method involves analyzing the content of items to understand their attributes. These attributes include not only explicit features like genre, keywords, or authors but also implicit features such as sentiment, complexity, or style.

In a generic context, a user profile is created, and the user is matched with items that align with their profile. For this example, we will use basic user input to simulate a customer profile. Numerical genre mapping will be used, corresponding to selective book titles.

# Genre mapping
genre_mapping = {
    1: 'Mystery',
    2: 'Romance',
    3: 'Thriller',
    4: 'Drama',
    5: 'Science Fiction',
    6: 'Adventure',
    7: 'Comedy',
    8: 'Fantasy'
}

# Example books data with numerical genres
books_data = {
    'The Silent Detective': {'genres': [1, 3]},      # Mystery, Thriller
    'Love in Bloom': {'genres': [2, 4]},              # Romance, Drama
    'Galactic Odyssey': {'genres': [5, 6]},           # Science Fiction, Adventure
    'Laugh Out Loud': {'genres': [7, 8]},             # Comedy, Fantasy
    'Twisted Secrets': {'genres': [3, 1]},            # Thriller, Mystery
    'Starry Nights': {'genres': [2, 5]},              # Romance, Science Fiction
    'Drama on Everest': {'genres': [4, 6]},           # Drama, Adventure
    'The Enchanted Mystery': {'genres': [1, 8]},      # Mystery, Fantasy
    'Comedy Central': {'genres': [7, 3]},             # Comedy, Thriller
    'Whimsical Tales': {'genres': [4, 7]}             # Drama, Comedy
}

Now, for some basic Python configuration:

Let's initialize our class with the dataset ('books_data'), which contains all the book information defined above.

class BookRecommender:
    def __init__(self, books_data):
        self.books_data = books_data

Let's now create a function to recommend the books.

This function takes the "selected_genre" as input, and iterates through the dataset, checking if the selected genre is present in the genre list. If a match is found, it will be added to the "recommended books."

def recommend_books(self, selected_genre):
    recommended_books = []

    for book_title, book_info in self.books_data.items():
        if selected_genre in book_info['genres']:
            recommended_books.append(book_title)

    return recommended_books

Now, let's define the variables used in the function:

# Display genre options
print("Genre Options:")
for num, genre in genre_mapping.items():
    print(f"{num}: {genre}")

# Get user input for the selected genre
selected_genre = int(input("Enter the number of the genre you want to explore: "))

# Create a book recommender instance
book_recommender = BookRecommender(books_data)

# Get recommended books based on the selected genre
recommended_books = book_recommender.recommend_books(selected_genre)

Finally, a print statement to display the results of the "recommended_books."

if recommended_books:
    print("Recommended Books:")
    for book in recommended_books:
        print(f"- {book}: {books_data[book]['genres']}")
else:
    print("No books found for the selected genre.")

Output:

As demonstrated below, you can see that all the recommended books are mystery books.

To sum it up, going back to the Sears catalog days reminds us of a time when picking gifts was straightforward and personal. Nowadays, modern recommenders use a mix of methods to make suggestions, trying to balance personal preferences and data analysis. As these systems get more complex, it's important to keep things simple and make sure recommendations stay true to what users really like. In the quest for improvement, let's not forget the charm of simplicity that made choosing from a catalog special in the first place.