(Part #7) Product matching via ML: Post-processing

Written by: Miša Krunić Best practices in price monitoring, New Price2Spy features 19.6.2020. Reading Time: 2 minutes

Previous topic: (Part #6) Evaluating ML training results

Next topic: (Part #8) Testing on various industries/languages

This part was logically the most difficult to comprehend, at least for me. Let’s give it a try:

We have 50K products from Set A and 40K products from Set B.
We have already eliminated the improbable A/B combinations by blocking
Each A/B combination which is probably enough has been scored – it has received a matching score between 0 and 1. So, let’s say product A has 12 potential matches with the product from B, each with its own matching score
So, which B is it? Should be the one with the highest score, right? Ok, let’s assume that we have taken the combination with the highest score, and that is A1 and B9. That means:
- a. A1 is matched with B9, so A1 cannot be matched with any other B
- b. B9 is matched with A1, so B9 cannot be matched with any other A
That means when we start evaluating the next product from Set A (A2(, B9 should be left out as a potential match
But, can we trust that A1 and B9 are a certain match, and they cannot be used as matches for any other product? If we do trust it but we make a mistake, the result will be a chain reaction of wrong matches – a disaster for both accuracy and sensitivity.

We have done a lot of experimenting on this one, and came with the following solution:

If the matching score is above X1 (configurable threshold) – consider it a certain match. Do not consider this A, nor B as a potential match for any other product (consider them exhausted)! The most typical value for X1 is 0.95
If the matching score is below X1, take N best matches (according to matching score). Such matches are not certain, so do not consider either A not be exhausted!

Yet, this was not enough – so we had to introduce some algorithmic rules – mostly for preventing bad matches to be established.

For example: if both products had the same entity – but of a different value, disregard it as a potential match. That meant that the following cases were eliminated:

Red vs Black
Red vs Schwartz
100g pack vs 250g pack size

More information can be found here:

Product matching in Price2Spy
Previous topic: (Part #6) Evaluating ML training results
Next topic: (Part #8) Testing on various industries/languages

Author

Miša Krunić

Father of 2, Husband of 1, CEO of 3 :-)

Featured Posts

Guiding the Consumer Decision-Making Process Throu…

Understanding the Customer Mindset Whether it’s comparing headphones on Amazon or checking local grocery delivery apps for the best deals, consumers today are more informed — and more price-conscious — than ever before. Their path to purchase is filled with moments...

analyze competitor product offerings in ecommerce

How to analyze competitor product offerings?

Running a smart, profitable, and sustainable eCommerce business requires owners to know competitor product offerings. You can’t stand out if you don’t know what you are standing next to. If your products are too similar with nothing extra to offer, you are...

Purchase Decision Process: Sell More by Understand…

Prices impact the customer’s purchase decision process at all the stages of their journey. Pricing considerations are present throughout all the parts of the purchasing funnel—top, middle, and bottom. Sellers need to understand the fact that the customer’s purchase decisions...

Services

Customization

Pricing intelligence

Pricing analytics

Modules

Uses

(Part #7) Product matching via ML: Post-processing

Featured Posts