(Part #8) Product matching via ML: Testing on various industries/languages

Written by: Miša Krunić Best practices in price monitoring, New Price2Spy features 19.6.2020. Reading Time: 2 minutes

Previous topic: (Part #7) Product matching via ML: Post-processing

Next topic: (Part #9) ML does work, but it’s not magic

The Product Matching project had been going on for a while, now was the time to remember our initial decisions, and put them to the test:

Languages – we have too many languages, we need a universal solution
Industries – again, too many industries, we need a universal solution

Fortunately enough, Price2Spy clients constantly keep us busy with new matching tasks, so we had good evaluation samples.

Therefore, we did the following tests:

1.German musical products, but from smaller websites, outside of 12 major websites used for training ML model – as expected – it worked great. Same language, the same industry as in ML training set, so good results were expected.

2. Italian musical products – same industry, but very different languages (both from Latin script, though). Results were just slightly worse than 1). However, one should keep in mind that in both cases (both Italian and German stores) use English wording quite often (but not always!)

3. Australian consumer electronic products – different industries, different languages compared to the ML training set. At first, the results were rather poor. This is when we figured out that we needed extra features, namely

Entity recognition
Additional alpha-numeric features (to cater for broader variations of MPNs)
(after this the result got much better)

4. Pool cleaning equipment – Italy, Spain, France, Benelux – so many different languages, an industry which has nothing in common with the ML training set. Here the results were rather poor, after deep troubleshooting, it appeared that some websites did not use standardized product names, but rather introduced model names of their own. So, the ML model is not a piece of magic, it won’t always work!

5. Books, perfumes, and toys from Romania – again, a language which differs from our ML training data, and a very different industry. The results were great for perfumes and toys, but not for books (sequels gave us a lot of trouble – their naming is almost identical, but they definitely are not the same books). So, again a good lesson when ML can be trusted more, and when extra-human work is needed.

6. Mixed products from Middle East (food, consumer electronics, office supplies, etc) – English language (so, differing from our ML training set), and a very different industry. The results were great!

As we were performing these tests, we were learning, and improving our ML model.

Most importantly, we have proved that the concept works – can be used for (almost) any language, and for most of the industries.

Both accuracy and sensitivity figures were going up, which was good. But one thing was troubling us – remember the initial set of decisions – we’re striving for 100% matching accuracy.

There was more work to be done.

More about it on the following links:

Product matching in Price2Spy
Previous topic: (Part #7) Product matching via ML: Post-processing
Next topic: (Part #9) ML does work, but it’s not magic

Author

Miša Krunić

Father of 2, Husband of 1, CEO of 3 :-)

Featured Posts

Customer Journey in eCommerce: The Blind Spots Mos…

Most customer journey maps look remarkably similar. Customer journey maps follow the customer from awareness to purchase, identify key touchpoints, assign emotions to each interaction, and highlight opportunities for improvement. Marketing teams use them to optimize campaigns, customer experience teams to...

What is Performance-based Pricing and Does It Appl…

Pricing terminology can sometimes be confusing. Some terms sound similar, but describe very different things. One example is performance pricing, or more accurately, performance-based pricing. At first glance, it may sound like another eCommerce pricing strategy. Since online retailers already deal...

Adapt eCommerce pricing throughout the customer decision making process

How Retailers Should Adapt Pricing Throughout the …

Modern consumers rarely follow a simple or predictable path to purchase. They compare products across marketplaces, revisit offers multiple times, evaluate alternatives side by side, and often delay decisions until pricing, timing, or conditions feel right. For retailers, this creates a...

Services

Customization

Pricing intelligence

Pricing analytics

Modules

Uses

(Part #8) Product matching via ML: Testing on various industries/languages

Featured Posts