How to Create a Product Recommendation System with Python, Pandas, NumPy, and Scikit-learn
20 Mar, 2026
Introduction
Recommendation systems help businesses suggest products to customers based on their preferences and behavior. With Python libraries like Pandas, NumPy, and Scikit-learn, you can build a simple recommendation engine that analyzes user ratings and predicts what products they might like.
This guide shows you how to create a basic product recommendation system step by step.
Prerequisites
Before you start:
- Purchase an Ubuntu 24.04 VPS server. If you don't have a VPS server, sign up with Vultr and get up to $300 worth of free credit to test the Vultr platform.
-
SSH to your VPS server using PuTTY for Windows or run the following command if you're using Linux or Mac.
console$ ssh username@vps_server_public_ip_address -
Create a non-root user with sudo privileges. Read our guide on How to Create a Non-Root Sudo User on Ubuntu 24.04. You'll use this user's account to run the commands in this guide.
-
Install Python 3.10 or later by following our How to Install Python on Ubuntu 24.04 guide.
Set Up a Virtual Environment
It’s best practice to use a virtual environment so your project dependencies don’t interfere with other Python projects.
-
Create and switch to a new directory for your project
console$ mkdir product-recommendation && cd product-recommendation -
Create a virtual environment:
console$ python3 -m venv venv -
Activate the virtual environment:
console$ source venv/bin/activate -
Install the required libraries inside this environment:
console(venv) $ pip install pandas numpy scikit-learn
Create a Sample Dataset
-
Use Pandas to build a simple dataset of users rating products.
Pythonimport pandas as pd # Sample data: user, product, rating data = { 'user_id': [1, 1, 1, 2, 2, 3, 3, 4], 'product_id': [101, 102, 103, 101, 104, 102, 104, 103], 'rating': [5, 3, 4, 4, 5, 2, 4, 5] } df = pd.DataFrame(data) print(df) -
You'll use this data set in the next step.
Create a User-Product Matrix
-
Pivot the data so that each row represents a user and each column represents a product.
Pythonuser_product_matrix = df.pivot_table( index='user_id', columns='product_id', values='rating' ) print(user_product_matrix)
This matrix shows which user rated which product. Missing values mean the user hasn’t rated that product yet.
Handle Missing Values
-
Replace missing values with
0for simplicity.Pythonimport numpy as np matrix_filled = user_product_matrix.fillna(0) print(matrix_filled)
Compute Similarity Between Users
-
Use cosine similarity from Scikit-learn to measure how similar users are based on their ratings.
Pythonfrom sklearn.metrics.pairwise import cosine_similarity # Compute similarity user_similarity = cosine_similarity(matrix_filled) print(user_similarity) -
Each value shows how similar two users are (closer to 1 means more similar).
Make Recommendations
-
Recommend products to a user based on what similar users liked.
Pythondef recommend_products(user_id, matrix, similarity, top_n=2): # Get similarity scores for the user sim_scores = similarity[user_id - 1] # adjust index # Find most similar user similar_user = np.argsort(sim_scores)[-2] + 1 # skip self print(f"User {user_id} is most similar to User {similar_user}") # Get products rated by the similar user user_ratings = matrix.loc[similar_user] recommended = user_ratings[user_ratings > 0].index.tolist() return recommended[:top_n] print(recommend_products(1, matrix_filled, user_similarity)) -
This function finds the most similar user and recommends the products they rated.
Test the Recommendation System
-
Try recommending products for different users:
Pythonprint("Recommendations for User 1:", recommend_products(1, matrix_filled, user_similarity)) print("Recommendations for User 2:", recommend_products(2, matrix_filled, user_similarity))
Conclusion
In this guide, you built a simple product recommendation system using Pandas, NumPy, and Scikit-learn. You learned how to create a user-product matrix, compute similarity between users, and recommend products based on similar users’ ratings. This is a basic collaborative filtering approach. You can improve it by using larger datasets, more advanced similarity measures, or machine learning models to achieve higher accuracy.