(Part #1) Product matching via Machine Learning – Introduction to the project

Best practices in price monitoring, New Price2Spy features 19.6.2020. Reading Time: 3 minutes

In the last couple of years, we have all witnessed the rise of new technology – Artificial Intelligence, or as we in Price2Spy prefer to call it: Machine Learning (ML).

The whole concept was new to us, none of the Price2Spy development crew had any experience with it – but we sensed that it had huge potential, and we were eager to learn.

After a couple of months of courses and theoretical introductions – we asked ourselves – how can we apply ML in everyday Price2Spy operations?

We had several candidate-projects, but one of them was our favorite from the very start – Product Matching.

Not because it was an easy win. On the contrary, it was the hardest ML problem we could think of – but our clients needed it very badly. That meant we needed it as well.

Product matching is an essential part of Price2Spy’s services. To put it simply, with no product matching our client wouldn’t be able to perform any kind of price comparison.

So far, we’ve had 3 ways of how products can be matched:

A) Automatch – fully automated process, applicable when client’s products (and products listed on competitor sites) have something we call ‘unique identifier’ – this can be EAN, UPC, ASIN – or in most general case MPN (Manufacturer Part Number). As you may guess, this method is not always applicable

B) Manual product matching – since humans are performing the matching – it’s always applicable. However, in case a client has 100 000s of products and he wants results real fast – this can be a problem – manual matching is simply not cost-effective enough, nor can it be done at the snap of a finger.

C) Hybrid product matching – is a combination of A) and B) – Automatch provides candidate-matches (which are not reliable enough to be trusted automatically), and humans check if these matches are good (need to be promoted) or bad (will be rejected).

The problem is that Automatch was unable to work with examples like below, where matches are obvious (or nearly obvious) to the human eye, but search on a competitor site does not yield any results

Here are several such examples:

product matching table

The idea was to introduce a 4th method, universally applicable, which will be reliable enough that it can be trusted. We had a feeling that ML should be helpful, but we had no idea where to start.

But, before jumping on the project, we wanted to check if anyone else did it before us and if the solution was possible in the public domain?

  • Attribute Extraction from Product Titles in eCommercehttps://arxiv.org/abs/1608.04670 – our colleagues from Walmart have tackled the problem which is seemingly similar – but which actually does not deal with matching
  • Product Matching in eCommerce using deep learninghttps://medium.com/walmartlabs/product-matching-in-ecommerce-4f19b6aebaca – in continuation of the above study, this article does deal with product matching, which is what we’re trying do as well. However, it deals with the matching of a single product (while we deal with the problem of matching the whole set of products). To be honest, we got a bit discouraged by the fact that authors himself state that the matching accuracy is between 85% and 90% (we aimed for much more)
  • A Machine Learning Approach for Product Matching and Categorizationhttps://www.semantic-web-journal.net/system/files/swj1470.pdf – this article is helpful, but only if you’re very deep into ML. At the beginning of our project, we were simply not on this level
  • Unraveling product matching in retail with AIhttps://towardsdatascience.com/unravelling-product-matching-with-ai-1a6ef7bd8614 – this article was posted long after we have embarked on our project. Unfortunately, it does not reveal much about the technical details of the ML implementation.

So, we had to start digging ourselves.

More information available on:


Miša Krunić
Father of 2, Husband of 1, CEO of 3 :-)