{"id":7309,"date":"2020-06-19T12:25:48","date_gmt":"2020-06-19T12:25:48","guid":{"rendered":"https:\/\/www.price2spy.com\/blog\/?p=7309"},"modified":"2020-07-31T08:41:56","modified_gmt":"2020-07-31T08:41:56","slug":"part-9-ml-does-work-but-its-not-magic","status":"publish","type":"post","link":"https:\/\/www.price2spy.com\/blog\/part-9-ml-does-work-but-its-not-magic\/","title":{"rendered":"(Part #9) ML does work, but it\u2019s not magic"},"content":{"rendered":"\n<p> <a rel=\"noreferrer noopener\" aria-label=\"Product matching in Price2Spy (opens in a new tab)\" href=\"https:\/\/www.price2spy.com\/en\/pricing\/product-matching.html\" target=\"_blank\">Product matching in Price2Spy<\/a> <\/p>\n\n\n\n<p> <strong>Previous topic:<\/strong>  <a rel=\"noreferrer noopener\" aria-label=\"(Part #8) Product matching via ML: Testing on various industries\/languages (opens in a new tab)\" href=\"https:\/\/www.price2spy.com\/blog\/part-8-product-matching-via-ml-testing-on-various-industries-languages\/\" target=\"_blank\">(Part #8) Product matching via ML: Testing on various industries\/languages<\/a> <\/p>\n\n\n\n<p><strong>Next topic: <\/strong> <a rel=\"noreferrer noopener\" aria-label=\"(Part #10) Product matching via ML: The results  (opens in a new tab)\" href=\"https:\/\/www.price2spy.com\/blog\/part-10-product-matching-via-ml-the-results\/\" target=\"_blank\">(Part #10) Product matching via ML: The results <\/a><\/p>\n\n\n\n<p>Since we said that we cannot afford to have wrong matches in <a href=\"https:\/\/www.price2spy.com\/\">Price2Spy<\/a>, we were particularly careful when testing \u2018false positives\u2019 \u2013 these were matches where ML scored a potential match with a very high score (97%), while in fact, it was not a match. Why were such cases happening in the first place?<\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter size-full is-resized\"><img decoding=\"async\" src=\"https:\/\/www.price2spy.com\/blog\/wp-content\/uploads\/2020\/06\/9.png\" alt=\"ML matches\" class=\"wp-image-7310\" width=\"600\" srcset=\"https:\/\/www.price2spy.com\/blog\/wp-content\/uploads\/2020\/06\/9.png 809w, https:\/\/www.price2spy.com\/blog\/wp-content\/uploads\/2020\/06\/9-768x230.png 768w\" sizes=\"(max-width: 809px) 100vw, 809px\" \/><\/figure><\/div>\n\n\n\n<p>We spent quite some time trying to find a solution to the above problem. The proper solution would be to introduce Chicken and Cheese as entities, in order to help ML process learn that such to words cannot be considered a match, although the remainder of the product attributes is 100% identical. However, where would such an approach take us \u2013 how many entities would we end up with, and how many entities are we going to need? The idea was theoretically great, but not applicable in practice.<\/p>\n\n\n\n<p>We have encountered a few more problems, which we categorized in the following way:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li><strong>Problems that we can solve with improvements in the ML process<\/strong><ul><li>Missing data \u2013 we have noticed that in many cases product data taken from certain websites was incomplete. For example, the Brand field was empty. This particular problem we were able to fix with an addition to our ML pre-processor (by filling in missing Brand names)<\/li><li>Spelling mistakes \u2013 product data is not free of such problems. Fortunately, ML blocking mechanism is configurable, so we were able to eliminate such problems by loosening up parameter values<\/li><li>Brands with synonyms \u2013 for example, General Electric is a synonym for GE<\/li><\/ul><\/li><\/ul>\n\n\n\n<ul class=\"wp-block-list\"><li><strong>Problems that we cannot solve (and thus we will not attempt to solve them)<\/strong><ul><li>Wrong brands \u2013 again, caused by data impurity on competitor websites. Since Brand is an important feature in the ML model, having a wrong brand almost certainly means that match will not be established<\/li><li>The wrong price \u2013 believe it or not, we have also encountered situations where the price was wrong (423 instead of 42.3 EUR). Again, price is another important feature in the ML model, having a price which is very off will certainly mean that the match will not be established<\/li><\/ul><\/li><\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">ML working together with humans<\/h3>\n\n\n\n<p>In a previous couple of minutes we have reached 2 important conclusions:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>There are certain gaps in data that we won\u2019t be able to fix in an automated way<\/li><li>ML accuracy is above, 95%, we can improve it further, but it will never ever reach the 100% goal (and we cannot afford to have wrong matches in <a href=\"https:\/\/www.price2spy.com\/en\/pricing\/product-matching.html\">Price2Spy product matching<\/a>)<\/li><\/ul>\n\n\n\n<p>Conclusion: we cannot eliminate the human factor in product matching!<\/p>\n\n\n\n<p>So, instead of aiming for full automation, we should aim for something else: ML process which will reduce the amount of human product matching effort.<\/p>\n\n\n\n<p>Let\u2019s formalize the idea:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>There will be a very small number of matches established by ML (the ones with highest scores) which won\u2019t need to be approved by humans. These products will be considered as exhausted for other potential matches.<\/li><li>In all other cases, ML will return N best matching candidates (below we will explain how we have reached optimal N). These candidates would need to be checked by humans and confirmed as good matches (we call this promote\/reject process)<\/li><\/ul>\n\n\n\n<p>The next question is \u2013 if we are using the human workforce to confirm if matches are OK \u2013 why do we go for the ML process in the first place? The question is perfectly valid, and it brought us to the last stage of our project \u2013 the Benefit Calculator.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Benefit Calculator<\/h3>\n\n\n\n<p>Let\u2019s suppose that we need to match A products from the client\u2019s site to 1 competitor\u2019s site. This means that our human colleagues in charge of manual matching will have to perform A matching combinations.<\/p>\n\n\n\n<p>However, what our human colleagues need to do in case of ML matching process is different<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>Instead of looking for a match, they will need to promote\/reject matching candidates. This requires less effort \u2013 and we\u2019ll call this effort M. We have calculated that typical M is between 0.3 and 0.5 as opposed to 1.0 (in case of pure manual matching)<\/li><li>When checking N matching candidates:<ul><li> If we promote the very 1<sup>st<\/sup> one \u2013 there will be no need to check other candidates for the same product (they should be automatically rejected). So, we have spent the human effort of M.<\/li><li>If we promote the 2<sup>nd<\/sup> one \u2013 there will be no need to check other candidates for the same product (they should be automatically rejected). So, we have spent a human effort of 2xM.<\/li><li>etc etc<\/li><li>If we reject all N matching candidates, and none of them if promoted, we will need to do manual product matching once again \u2013 meaning that the total effort is 1+NxM (which is of course not good \u2013 but such cases will be inevitable)<\/li><\/ul><\/li><\/ul>\n\n\n\n<ul class=\"wp-block-list\"><li>So, in order for our ML model to be cost-effective, we need to have moderate value for N (we have established that in most cases N = 3 is optimal)<\/li><li>The power of the ML matching model goes higher as M goes lower (which depends on the industry).<\/li><li>ML matching process does come with an overhead \u2013 we need to account for:<ul><li>Data collection (someone needs to prepare B data sets)<\/li><li>QA of data from Set B<\/li><li>Executing the ML process<\/li><\/ul><\/li><\/ul>\n\n\n\n<p>So, what does the Benefit Calculator do \u2013 it checks the amount of human work needed to check the matching candidates provided by the ML matching process, as opposed to the effort needed for pure human matching.<\/p>\n\n\n\n<p>What results did we get by the Benefit Calculator? Leaving it to the next chapter\u2026<\/p>\n\n\n\n<p><strong>More useful links:<\/strong><\/p>\n\n\n\n<p> <a rel=\"noreferrer noopener\" href=\"https:\/\/www.price2spy.com\/en\/pricing\/product-matching.html\" target=\"_blank\">Product matching in Price2Spy<\/a>  <\/p>\n\n\n\n<p> <strong>Previous topic:<\/strong>  <a rel=\"noreferrer noopener\" href=\"https:\/\/www.price2spy.com\/blog\/part-8-product-matching-via-ml-testing-on-various-industries-languages\/\" target=\"_blank\">(Part #8) Product matching via ML: Testing on various industries\/languages<\/a>  <\/p>\n\n\n\n<p> <strong>Next topic:<\/strong>  <a rel=\"noreferrer noopener\" aria-label=\"(Part #10) Product matching via ML: The results  (opens in a new tab)\" href=\"https:\/\/www.price2spy.com\/blog\/part-10-product-matching-via-ml-the-results\/\" target=\"_blank\">(Part #10) Product matching via ML: The results <\/a> <\/p>\n","protected":false},"excerpt":{"rendered":"<p>Product matching in Price2Spy Previous topic: (Part #8) Product matching via ML: Testing on various industries\/languages Next topic: (Part #10) Product matching via ML: The results Since we said that we cannot afford to have wrong matches in Price2Spy, we were particularly careful when testing&#8230;<\/p>\n","protected":false},"author":4,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[108,167],"tags":[190,645,646,15,81],"class_list":["post-7309","post","type-post","status-publish","format-standard","hentry","category-best-practices","category-new-price2spy-features","tag-ecommerce","tag-machine-learning","tag-ml","tag-price2spy","tag-product-matching"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.4 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>(Part #9) ML does work, but it\u2019s not magic<\/title>\n<meta name=\"description\" content=\"Find more about what ML can do, and what is still impossible.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.price2spy.com\/blog\/part-9-ml-does-work-but-its-not-magic\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"(Part #9) ML does work, but it\u2019s not magic\" \/>\n<meta property=\"og:description\" content=\"Find more about what ML can do, and what is still impossible.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.price2spy.com\/blog\/part-9-ml-does-work-but-its-not-magic\/\" \/>\n<meta property=\"og:site_name\" content=\"Price2Spy\u00ae Blog\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/Price2Spy\/\" \/>\n<meta property=\"article:published_time\" content=\"2020-06-19T12:25:48+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2020-07-31T08:41:56+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.price2spy.com\/blog\/wp-content\/uploads\/2020\/06\/9.png\" \/>\n<meta name=\"author\" content=\"Mi\u0161a Kruni\u0107\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@Price2Spy\" \/>\n<meta name=\"twitter:site\" content=\"@Price2Spy\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Mi\u0161a Kruni\u0107\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"5 minutes\" \/>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"(Part #9) ML does work, but it\u2019s not magic","description":"Find more about what ML can do, and what is still impossible.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.price2spy.com\/blog\/part-9-ml-does-work-but-its-not-magic\/","og_locale":"en_US","og_type":"article","og_title":"(Part #9) ML does work, but it\u2019s not magic","og_description":"Find more about what ML can do, and what is still impossible.","og_url":"https:\/\/www.price2spy.com\/blog\/part-9-ml-does-work-but-its-not-magic\/","og_site_name":"Price2Spy\u00ae Blog","article_publisher":"https:\/\/www.facebook.com\/Price2Spy\/","article_published_time":"2020-06-19T12:25:48+00:00","article_modified_time":"2020-07-31T08:41:56+00:00","og_image":[{"url":"https:\/\/www.price2spy.com\/blog\/wp-content\/uploads\/2020\/06\/9.png","type":"","width":"","height":""}],"author":"Mi\u0161a Kruni\u0107","twitter_card":"summary_large_image","twitter_creator":"@Price2Spy","twitter_site":"@Price2Spy","twitter_misc":{"Written by":"Mi\u0161a Kruni\u0107","Est. reading time":"5 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.price2spy.com\/blog\/part-9-ml-does-work-but-its-not-magic\/#article","isPartOf":{"@id":"https:\/\/www.price2spy.com\/blog\/part-9-ml-does-work-but-its-not-magic\/"},"author":{"name":"Mi\u0161a Kruni\u0107","@id":"https:\/\/www.price2spy.com\/blog\/#\/schema\/person\/08e388ab2e43e97b3618363fbbe94ded"},"headline":"(Part #9) ML does work, but it\u2019s not magic","datePublished":"2020-06-19T12:25:48+00:00","dateModified":"2020-07-31T08:41:56+00:00","mainEntityOfPage":{"@id":"https:\/\/www.price2spy.com\/blog\/part-9-ml-does-work-but-its-not-magic\/"},"wordCount":981,"commentCount":0,"image":{"@id":"https:\/\/www.price2spy.com\/blog\/part-9-ml-does-work-but-its-not-magic\/#primaryimage"},"thumbnailUrl":"https:\/\/www.price2spy.com\/blog\/wp-content\/uploads\/2020\/06\/9.png","keywords":["ecommerce","machine learning","ml","price2spy","product matching"],"articleSection":["Best practices in price monitoring","New Price2Spy features"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/www.price2spy.com\/blog\/part-9-ml-does-work-but-its-not-magic\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/www.price2spy.com\/blog\/part-9-ml-does-work-but-its-not-magic\/","url":"https:\/\/www.price2spy.com\/blog\/part-9-ml-does-work-but-its-not-magic\/","name":"(Part #9) ML does work, but it\u2019s not magic","isPartOf":{"@id":"https:\/\/www.price2spy.com\/blog\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.price2spy.com\/blog\/part-9-ml-does-work-but-its-not-magic\/#primaryimage"},"image":{"@id":"https:\/\/www.price2spy.com\/blog\/part-9-ml-does-work-but-its-not-magic\/#primaryimage"},"thumbnailUrl":"https:\/\/www.price2spy.com\/blog\/wp-content\/uploads\/2020\/06\/9.png","datePublished":"2020-06-19T12:25:48+00:00","dateModified":"2020-07-31T08:41:56+00:00","author":{"@id":"https:\/\/www.price2spy.com\/blog\/#\/schema\/person\/08e388ab2e43e97b3618363fbbe94ded"},"description":"Find more about what ML can do, and what is still impossible.","breadcrumb":{"@id":"https:\/\/www.price2spy.com\/blog\/part-9-ml-does-work-but-its-not-magic\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.price2spy.com\/blog\/part-9-ml-does-work-but-its-not-magic\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.price2spy.com\/blog\/part-9-ml-does-work-but-its-not-magic\/#primaryimage","url":"https:\/\/www.price2spy.com\/blog\/wp-content\/uploads\/2020\/06\/9.png","contentUrl":"https:\/\/www.price2spy.com\/blog\/wp-content\/uploads\/2020\/06\/9.png","width":809,"height":243},{"@type":"BreadcrumbList","@id":"https:\/\/www.price2spy.com\/blog\/part-9-ml-does-work-but-its-not-magic\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.price2spy.com\/blog\/"},{"@type":"ListItem","position":2,"name":"(Part #9) ML does work, but it\u2019s not magic"}]},{"@type":"WebSite","@id":"https:\/\/www.price2spy.com\/blog\/#website","url":"https:\/\/www.price2spy.com\/blog\/","name":"Price2Spy\u00ae Blog","description":"Price2Spy\u00ae","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.price2spy.com\/blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/www.price2spy.com\/blog\/#\/schema\/person\/08e388ab2e43e97b3618363fbbe94ded","name":"Mi\u0161a Kruni\u0107","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/secure.gravatar.com\/avatar\/31aa4afb2464eca1f1ca0c7979628c87e54e7a6b53ebcb371749e9349d27c850?s=96&d=mm&r=g","url":"https:\/\/secure.gravatar.com\/avatar\/31aa4afb2464eca1f1ca0c7979628c87e54e7a6b53ebcb371749e9349d27c850?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/31aa4afb2464eca1f1ca0c7979628c87e54e7a6b53ebcb371749e9349d27c850?s=96&d=mm&r=g","caption":"Mi\u0161a Kruni\u0107"},"description":"Father of 2, Husband of 1, CEO of 3 :-)","sameAs":["http:\/\/www.price2spy.com"],"url":"https:\/\/www.price2spy.com\/blog\/author\/misha\/"}]}},"_links":{"self":[{"href":"https:\/\/www.price2spy.com\/blog\/wp-json\/wp\/v2\/posts\/7309","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.price2spy.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.price2spy.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.price2spy.com\/blog\/wp-json\/wp\/v2\/users\/4"}],"replies":[{"embeddable":true,"href":"https:\/\/www.price2spy.com\/blog\/wp-json\/wp\/v2\/comments?post=7309"}],"version-history":[{"count":5,"href":"https:\/\/www.price2spy.com\/blog\/wp-json\/wp\/v2\/posts\/7309\/revisions"}],"predecessor-version":[{"id":7420,"href":"https:\/\/www.price2spy.com\/blog\/wp-json\/wp\/v2\/posts\/7309\/revisions\/7420"}],"wp:attachment":[{"href":"https:\/\/www.price2spy.com\/blog\/wp-json\/wp\/v2\/media?parent=7309"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.price2spy.com\/blog\/wp-json\/wp\/v2\/categories?post=7309"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.price2spy.com\/blog\/wp-json\/wp\/v2\/tags?post=7309"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}