(Part #8) Product matching via ML: Testing on various industries/languages

Written by: Miša Krunić Best practices in price monitoring, New Price2Spy features 19.6.2020. Reading Time: 2 minutes

Previous topic: (Part #7) Product matching via ML: Post-processing

Next topic: (Part #9) ML does work, but it’s not magic

The Product Matching project had been going on for a while, now was the time to remember our initial decisions, and put them to the test:

Languages – we have too many languages, we need a universal solution
Industries – again, too many industries, we need a universal solution

Fortunately enough, Price2Spy clients constantly keep us busy with new matching tasks, so we had good evaluation samples.

Therefore, we did the following tests:

1.German musical products, but from smaller websites, outside of 12 major websites used for training ML model – as expected – it worked great. Same language, the same industry as in ML training set, so good results were expected.

2. Italian musical products – same industry, but very different languages (both from Latin script, though). Results were just slightly worse than 1). However, one should keep in mind that in both cases (both Italian and German stores) use English wording quite often (but not always!)

3. Australian consumer electronic products – different industries, different languages compared to the ML training set. At first, the results were rather poor. This is when we figured out that we needed extra features, namely

Entity recognition
Additional alpha-numeric features (to cater for broader variations of MPNs)
(after this the result got much better)

4. Pool cleaning equipment – Italy, Spain, France, Benelux – so many different languages, an industry which has nothing in common with the ML training set. Here the results were rather poor, after deep troubleshooting, it appeared that some websites did not use standardized product names, but rather introduced model names of their own. So, the ML model is not a piece of magic, it won’t always work!

5. Books, perfumes, and toys from Romania – again, a language which differs from our ML training data, and a very different industry. The results were great for perfumes and toys, but not for books (sequels gave us a lot of trouble – their naming is almost identical, but they definitely are not the same books). So, again a good lesson when ML can be trusted more, and when extra-human work is needed.

6. Mixed products from Middle East (food, consumer electronics, office supplies, etc) – English language (so, differing from our ML training set), and a very different industry. The results were great!

As we were performing these tests, we were learning, and improving our ML model.

Most importantly, we have proved that the concept works – can be used for (almost) any language, and for most of the industries.

Both accuracy and sensitivity figures were going up, which was good. But one thing was troubling us – remember the initial set of decisions – we’re striving for 100% matching accuracy.

There was more work to be done.

More about it on the following links:

Product matching in Price2Spy
Previous topic: (Part #7) Product matching via ML: Post-processing
Next topic: (Part #9) ML does work, but it’s not magic

Author

Miša Krunić

Father of 2, Husband of 1, CEO of 3 :-)

Featured Posts

eCommerce Competition is Only Getting Fiercer - Le…

The Internet changed the way businesses operate. The emergence of eCommerce back in the 1990s has proved to be a tectonic change. This shifted shopping from offline to online and opened endless possibilities to companies. Inevitably, fierce eCommerce competition was created....

Price Monitoring: A Guide For eCommerce Business

Things in eCommerce are constantly changing, but one fact doesn't change - fierce competition. Even the seemingly unique businesses will sooner or later face new competitors. In those situations, the price might be the only thing differentiating one business from...

5 Reasons to Perform a Pricing Strategy Update

Here’s a simple question - is pricing strategy an important aspect of a well-functioning business? The correct answer is that of course, it is. But, can a pricing strategy become outdated? Yes, it can. This is exactly why a pricing strategy update...