Previous topic: (Part #7) Product matching via ML: Post-processing
Next topic: (Part #9) ML does work, but it’s not magic
The Product Matching project had been going on for a while, now was the time to remember our initial decisions, and put them to the test:
Fortunately enough, Price2Spy clients constantly keep us busy with new matching tasks, so we had good evaluation samples.
Therefore, we did the following tests:
1.German musical products, but from smaller websites, outside of 12 major websites used for training ML model – as expected – it worked great. Same language, the same industry as in ML training set, so good results were expected.
2. Italian musical products – same industry, but very different languages (both from Latin script, though). Results were just slightly worse than 1). However, one should keep in mind that in both cases (both Italian and German stores) use English wording quite often (but not always!)
3. Australian consumer electronic products – different industries, different languages compared to the ML training set. At first, the results were rather poor. This is when we figured out that we needed extra features, namely
4. Pool cleaning equipment – Italy, Spain, France, Benelux – so many different languages, an industry which has nothing in common with the ML training set. Here the results were rather poor, after deep troubleshooting, it appeared that some websites did not use standardized product names, but rather introduced model names of their own. So, the ML model is not a piece of magic, it won’t always work!
5. Books, perfumes, and toys from Romania – again, a language which differs from our ML training data, and a very different industry. The results were great for perfumes and toys, but not for books (sequels gave us a lot of trouble – their naming is almost identical, but they definitely are not the same books). So, again a good lesson when ML can be trusted more, and when extra-human work is needed.
6. Mixed products from Middle East (food, consumer electronics, office supplies, etc) – English language (so, differing from our ML training set), and a very different industry. The results were great!
As we were performing these tests, we were learning, and improving our ML model.
Most importantly, we have proved that the concept works – can be used for (almost) any language, and for most of the industries.
Both accuracy and sensitivity figures were going up, which was good. But one thing was troubling us – remember the initial set of decisions – we’re striving for 100% matching accuracy.
There was more work to be done.
More about it on the following links: