How AI can learn to recommend products to customers from Shopify descriptions.

How AI can learn to recommend products to customers from Shopify descriptions.

Deep Learning refers to a specific sub-category of the massive field of artificial intelligence. Applications are far and wide and cover exciting new areas such as self-driving cars and computer vision. The goal of this project was to use deep learning tools to build an application that could recommend certain products to customers on a Shopify site, based off pre-existing behaviours.

Shopify is an e-commerce platform that provides the means for anyone to set up, maintain, and grow an online store to sell their products. Shopify is a leading entity in its industry due to the quick and easy non-technical nature that allows for rapid scalability. They have a vast array of offerings that doesn't just include setting up an online store. For example, clients can make use of its app store, which provides over 1200 integrations that can be used to boost store performance and sell more. Shopify's API suite allow stores to make use of custom software to extend the platform's built-in features. You can learn more about Shopify here. For now, let's delve into the project.

The brains behind this operation is a machine learning language model known as Bidirectional Encoder Representations from Transformers, or BERT for short. This model was proposed by (Devlin et al., 2018) in their paper ‘BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding’. In machine learning, BERT has enabled cutting-edge innovation within the sub-field of natural language processing (NLP), which is the ability of artificial intelligence to work with human language. This project makes use of a version of BERT known as Sentence-BERT or SBERT.

What do BERT and SBERT do Exactly?

BERT makes use of the bidirectional training of Transformer, an attention model which captures the association between words in a transcript by focusing on each word’s surroundings within the text. Essentially, BERT stores each element of a sentence or text as a token. These tokens are each converted into a numerical vector. These vectors are then placed as a neural network within a pre-defined vector space to produce a sequence of vectors. BERT is pre-trained on the entire English Wikipedia and BooksCorpus.

The result is an impressively clever model that can derive the semantics or meaning out of words based on the context in which they are found. For example, the vector for king minus the vector for man should equal the vector for queen. And it does.

SBERT is a version of BERT which uses siamese and triplet network architectures that can produce these vectors, also known as embeddings, at the sentence level. These embeddings can be compared using cosine-similarity. Cosine-similarity quite literally is a measure of the similarity of two texts in their embedding form. Two identical texts will have a cosine-similarity of 1.

How We Used SBERT to Recommend Products Within Gentleman’s Journal

What we essentially wanted out of this project was to recommend different Gentleman’s Journal products to a customer, given the current product they were viewing. If the customer clicked on some sunglasses, then more sunglasses would be recommended. If the customer clicked on a pair of shoes, then shoe polish may be recommended. The idea is to suggest products to the consumer that they might also like to buy based on their viewing actions to ultimately increase checkout value.

To do this, product descriptions were compared using SBERT. Product descriptions are often paragraphs, however SBERT works at the sentence level. The workaround here was to treat each product description as a group of sentences each with its own unique vector embedding. An average vector can then be taken from this group of embeddings to summarise the product description into one embedding. What this also does is suppress the noise to focus on relevant information within the description.

Once an embedding was created for each product description across the entire Gentlemen’s Journal Shop, cluster analysis could be used to process these embeddings. Cluster analysis is an unsupervised machine learning task which involves grouping a set of objects based on those objects containing similar characteristics. Certain parameters such as distance thresholds allow the fine-tuning of how similar vectors could be within individual groupings.

The result was a program that when presented with a product, could condense the product description into a single vector and retrieve similar vectors based upon embedding clustering. These retrieved vectors represent other similar product descriptions which are attached to other similar products.

The Code

Let’s now have a look at an example where we process and compare two similar product descriptions for cotton shirts to retrieve their cosine similarity. We will be using two python libraries. The first is sentence-transformers. This is the BERT model we’ve been talking about. The second library will be sklearn.metrics.pairwise. This will allow us to measure the cosine similarity between vectors. Recall that embeddings are simply vector representations of text, so we can use their cosine similarity as a measure of how similar the texts are. Our libraries are imported as below:

from sentence_transformers import SentenceTransformer
from sklearn.metrics.pairwise import cosine_similarity

Our next step is to define a list of two product descriptions which we want to compare. In this example we compare the descriptions of two similar cotton shirts:

product_descriptions = ["If you’re looking for the perfect blue shirt - New & Lingwood's tailored-fit button down, featuring a classic oxford collar, is the best in business. It’s been crafted with a soft 100% Egyptian cotton, perfectly versatile for both a casual and smart ensemble.", "Featuring an ultra-breathable soft cotton blend knitted in Nottingham especially for us. Our contemporary yet relaxed London made Polo Shirts are perfect for dressing up or down."]

We can now call our pre-trained SBERT model as sentence model. This is the ‘all-MiniLM-L6-v2’ that you can see:

sentence model = SentenceTransformer('all-MiniLM-L6-v2')

Our sentences are now processed into embeddings using .encode():

embeddings = sentence_model.encode(product_descriptions)

In other words, this function is using SBERT to transform each product description into a vector that can be represented within a 365 dimensional vector space. To show the sheer amount of information contained within one embedding, this is what the first 3 of 96 lines of the first product description embedding looks like:

-7.12266341e-02 -2.27154493e-02  5.85283944e-03  6.39021164e-031.20765250e-02  2.06742473e-02  3.28435823e-02 -1.94635373e-02-4.33507301e-02  7.83237964e-02  6.62037656e-02 -1.08795159e-03

We can now measure the cosine similarity of the vectors we have just encoded. We mentioned before, the cosine similarity captures how similar these two vectors are.

But how can they do this?

To simplify, imagine these two 365 dimension vectors are plotted in the same space. Below is a simple 3D representation of what this would look like, where our two vectors are U and V.


Image taken from at

The cosine similarity is a function which measures the angle between these two vectors in order to determine to what extent they move in the same direction, and therefore contain similar information. The mathematical formula for this function is the normalised dot product of the two vectors and reads as follows, where U and V are the two vectors:

As you can see, it is all too easy to get bogged down in the mathematics, but for now all you need to know is that the cosine similarity measures to what extent the two embeddings contain the same information.

Luckily with python's sklearn.metrics.pairwise package, we have the cosine_similarity function that can compute this for our two embeddings. We do this below where we define the function as 'values.' The final line simply prints out the resulting cosine similarity for us!

values = cosine_similarity(embeddings[0].reshape(1,-1), embeddings[1].reshape(1,-1))
print('Cosine Similarity = {}'.format(values[0]))

This yields the following output:

cosine similarity = [0.5167135]

Our two product descriptions have a cosine similarity of 0.5167135. This is relatively high and with a bit of fine tuning, a threshold cosine similarity can be set to group relevant product descriptions when clustering. For example, we could set the minimum cosine similarity for any given embedding to be in the same group to be 0.4. This would put these two shirt products in the same cluster. We can then recommend products with the highest cosine similarity within these clusters. So if we are recommending 3 products based on the first shirt, and our second shirt within the same cluster with a cosine similarity 0.5167135 is one of the highest ranking cosine similarities, then the second shirt will be recommended!

This is the technology that we use at AI Commerce to recommend products to customers who are browsing across our clients' Shopify stores. The results are impressive, with our app doing exactly what it was intended for: recommend products to customers that they might like to increase conversion and shopping cart value. For more information about our services message us through via the chat-bot on our website.

Now that you know how to encode texts into embeddings and measure their cosine similarity you can have a go at using SBERT yourself! Play around with comparing different sentences and words to get a feel for how this impressive model can think.


I’m looking for help building a
Shopify site.

Learn More

I'm looking for more sales
in my Shopify store.

Learn More