python

Recommendation Engine In Python

Creating a recommendation engine in Python involves using various techniques, and the choice depends on the type of recommendation system you want to build. Broadly speaking, there are two main types of recommendation systems: collaborative filtering and content-based filtering. Hybrid systems, which combine both approaches, are also common.

Here, I’ll provide a simple example of a collaborative filtering recommendation system using the Surprise library, which is a Python scikit for building and analyzing recommender systems.

First, you need to install the Surprise library. You can do this using pip:

pip install scikit-surprise

Now, let’s create a basic collaborative filtering recommendation system using the MovieLens dataset.

from surprise import Dataset, Reader
from surprise.model_selection import train_test_split
from surprise import KNNBasic
from surprise import accuracy

# Load the MovieLens dataset
data = Dataset.load_builtin('ml-100k')

# Define the reader
reader = Reader(line_format='user item rating timestamp', sep='\t', rating_scale=(1, 5))

# Load the dataset using the reader
data = Dataset.load_from_file('path_to_dataset', reader=reader)

# Split the dataset into training and testing sets
trainset, testset = train_test_split(data, test_size=0.25)

# Use the KNNBasic algorithm for collaborative filtering
sim_options = {
    'name': 'cosine',
    'user_based': True
}

model = KNNBasic(sim_options=sim_options)

# Train the model
model.fit(trainset)

# Make predictions on the test set
predictions = model.test(testset)

# Evaluate the model
accuracy.rmse(predictions)

# Make recommendations for a user
user_id = 'userId'  # replace with an actual user ID from your dataset
user_ratings = data.raw_ratings_by_user()[user_id]
unrated_movies = [movie_id for movie_id in data.item_ids() if movie_id not in user_ratings]

# Predict ratings for unrated movies
predictions = [model.predict(user_id, movie_id) for movie_id in unrated_movies]

# Get the top N recommendations
top_n = sorted(predictions, key=lambda x: x.est, reverse=True)[:10]

# Print the top recommendations
for movie in top_n:
    print(f"Movie ID: {movie.iid}, Estimated Rating: {movie.est}")

This example uses the Surprise library and the MovieLens dataset. You can adapt it to your specific use case and dataset. Keep in mind that for large-scale production systems, you might need more advanced techniques and considerations such as matrix factorisation, deep learning, or collaborative filtering algorithms that scale well with large datasets.

Comments are closed.