(Part #2) Product matching via Machine Learning – Important decisions to be made

Best practices in price monitoring, New Price2Spy features 19.6.2020. Reading Time: 2 minutes

Before kicking the project off, we had to make some really important decisions regarding the project scope.

  1. Language-specific or universal ML model?
  • Of course, one would like his solution to be as broadly applicable as possible.
  • Language-specific model would probably be more precise but would require training for each language individually. And, preparing a training set, as you will see is a very difficult one
  • As Price2Spy has clients from literally all over the world, we would need to cover at least 15 different languages, and some of them written in non-Latin scripts
language specifics in machine learning
  • Pretty often we face situations where competitor A uses English wording of the product, while competitor B goes for the local language. For example iPhone 11 Red vs iPhone 11 Rot. Our ML model would need to be ready for such cases
  • Decision: try to go for a universal solution, by all means

2. Industry-specific or universal ML model?

industries
  • On the other hand, we all know how little similarities there are between the wording of fashion and luxury products, compared to tires or fresh food
  • Again, the industry-specific model would probably be more precise but would require training for each language individually. And, preparing a training set which is representative enough, as you will see is a very difficult one
  • Decision: try to go for a universal solution, by all means

3. Matching accuracy

  • One thing that we have learned in 9 years in this business is that a wrong match is something that we cannot afford to have in Price2Spy. Wrong match => Wrong pricing decision. Our customers cannot have that => we cannot have that!
  • 99% matching accuracy is not sufficient. Even if it’s only 1% of wrong matches – how can the client know which 1% is wrong?
  • ML is all about math and probability. Even when ML claims that we have a 99% probable match – that’s not good enough. Humans need to verify this
  • Fortunately enough, verifying a match takes much less human time that establishing one. So, ML will not fully replace the need for human work – but it will significantly reduce it while keeping the match quality at 100%
  • Decision: we’re striving for 100% matching accuracy

So, we have our 3 key ML matching decisions. On to the next task – preparing the training set!

Find more information here:

Author

Miša Krunić
Father of 2, Husband of 1, CEO of 3 :-)