(Part #2) Product matching via Machine Learning – Important decisions to be made
- Product matching in Price2Spy
- Previous topic: (Part #1) Product matching via Machine Learning – Introduction to the project
- Next topic: (Part #3) Product matching via Machine Learning – For ML experts – why is product matching so difficult?
Before kicking the project off, we had to make some really important decisions regarding the project scope.
- Language-specific or universal ML model?
- Of course, one would like his solution to be as broadly applicable as possible.
- Language-specific model would probably be more precise but would require training for each language individually. And, preparing a training set, as you will see is a very difficult one
- As Price2Spy has clients from literally all over the world, we would need to cover at least 15 different languages, and some of them written in non-Latin scripts
- Pretty often we face situations where competitor A uses English wording of the product, while competitor B goes for the local language. For example iPhone 11 Red vs iPhone 11 Rot. Our ML model would need to be ready for such cases
- Decision: try to go for a universal solution, by all means
2. Industry-specific or universal ML model?
- Price2Spy works with over 25 different industries. Preparing 25 training sets to build 25 different ML models seemed like a nightmare.
- On the other hand, we all know how little similarities there are between the wording of fashion and luxury products, compared to tires or fresh food
- Again, the industry-specific model would probably be more precise but would require training for each language individually. And, preparing a training set which is representative enough, as you will see is a very difficult one
- Decision: try to go for a universal solution, by all means
3. Matching accuracy
- One thing that we have learned in 9 years in this business is that a wrong match is something that we cannot afford to have in Price2Spy. Wrong match => Wrong pricing decision. Our customers cannot have that => we cannot have that!
- 99% matching accuracy is not sufficient. Even if it’s only 1% of wrong matches – how can the client know which 1% is wrong?
- ML is all about math and probability. Even when ML claims that we have a 99% probable match – that’s not good enough. Humans need to verify this
- Fortunately enough, verifying a match takes much less human time that establishing one. So, ML will not fully replace the need for human work – but it will significantly reduce it while keeping the match quality at 100%
- Decision: we’re striving for 100% matching accuracy
So, we have our 3 key ML matching decisions. On to the next task – preparing the training set!
Find more information here: