Home → Articles → How to Create a Product Recommendation System with Python, Pandas, NumPy, and Scikit-learn

How to Create a Product Recommendation System with Python, Pandas, NumPy, and Scikit-learn

20 Mar, 2026

Introduction

Recommendation systems help businesses suggest products to customers based on their preferences and behavior. With Python libraries like Pandas, NumPy, and Scikit-learn, you can build a simple recommendation engine that analyzes user ratings and predicts what products they might like.

This guide shows you how to create a basic product recommendation system step by step.

Prerequisites

Before you start:

Set Up a Virtual Environment

It’s best practice to use a virtual environment so your project dependencies don’t interfere with other Python projects.

  1. Create and switch to a new directory for your project

    console
    $ mkdir product-recommendation && cd product-recommendation
    
  2. Create a virtual environment:

    console
    $ python3 -m venv venv
    
  3. Activate the virtual environment:

    console
    $ source venv/bin/activate
    
  4. Install the required libraries inside this environment:

    console
    (venv) $ pip install pandas numpy scikit-learn
    

Create a Sample Dataset

  1. Use Pandas to build a simple dataset of users rating products.

    Python
    import pandas as pd
    
    # Sample data: user, product, rating
    data = {
        'user_id': [1, 1, 1, 2, 2, 3, 3, 4],
        'product_id': [101, 102, 103, 101, 104, 102, 104, 103],
        'rating': [5, 3, 4, 4, 5, 2, 4, 5]
    }
    
    df = pd.DataFrame(data)
    print(df)
    
  2. You'll use this data set in the next step.

Create a User-Product Matrix

  1. Pivot the data so that each row represents a user and each column represents a product.

    Python
    user_product_matrix = df.pivot_table(
        index='user_id',
        columns='product_id',
        values='rating'
    )
    
    print(user_product_matrix)
    

This matrix shows which user rated which product. Missing values mean the user hasn’t rated that product yet.

Handle Missing Values

  1. Replace missing values with 0 for simplicity.

    Python
    import numpy as np
    
    matrix_filled = user_product_matrix.fillna(0)
    print(matrix_filled)
    

Compute Similarity Between Users

  1. Use cosine similarity from Scikit-learn to measure how similar users are based on their ratings.

    Python
    from sklearn.metrics.pairwise import cosine_similarity
    
    # Compute similarity
    user_similarity = cosine_similarity(matrix_filled)
    
    print(user_similarity)
    
  2. Each value shows how similar two users are (closer to 1 means more similar).

Make Recommendations

  1. Recommend products to a user based on what similar users liked.

    Python
    def recommend_products(user_id, matrix, similarity, top_n=2):
        # Get similarity scores for the user
        sim_scores = similarity[user_id - 1]  # adjust index
        # Find most similar user
        similar_user = np.argsort(sim_scores)[-2] + 1  # skip self
        print(f"User {user_id} is most similar to User {similar_user}")
    
        # Get products rated by the similar user
        user_ratings = matrix.loc[similar_user]
        recommended = user_ratings[user_ratings > 0].index.tolist()
    
        return recommended[:top_n]
    
    print(recommend_products(1, matrix_filled, user_similarity))
    
  2. This function finds the most similar user and recommends the products they rated.

Test the Recommendation System

  1. Try recommending products for different users:

    Python
    print("Recommendations for User 1:", recommend_products(1, matrix_filled, user_similarity))
    print("Recommendations for User 2:", recommend_products(2, matrix_filled, user_similarity))
    

Conclusion

In this guide, you built a simple product recommendation system using Pandas, NumPy, and Scikit-learn. You learned how to create a user-product matrix, compute similarity between users, and recommend products based on similar users’ ratings. This is a basic collaborative filtering approach. You can improve it by using larger datasets, more advanced similarity measures, or machine learning models to achieve higher accuracy.