Previous topic: (Part #5) ML training Implementation
Next topic: (Part #7) Post-processing
How do you know if your ML model works fine? The answer is seemingly simple: you run all of the potential matching combinations through Random Forest (RF), in order to get a matching score of each combination. Then, if the matching score is above X (matching threshold), you consider it a match – otherwise you consider it a non-match. Since we’re talking about the training set – the correct matches have already been established – we should know the correct answer (whether it’s a good match or not). The more correct answers your model has, the better its accuracy.
All of the above was easier said than done. Soon enough, we had faced a serious of unexpected questions in our product matching project
Please check out the following chart showing how our matching accuracy/sensitivity evolved, based on the value of X (matching threshold). I must say that from day 1 Accuracy was doing great – it was a sensitivity that made us work so hard.
One way of evaluating your model is finding the spot where accuracy and sensitivity curves meet. The higher X for that meeting point – the better. In our early days, X was 74, as we were improving the features and the post-processing methods, it went up to 85, while now being close to 95.
Pretty soon in our evaluation process we noticed something odd. We had combinations with great matching score (close to 100), which were marked as wrong matches (they were not marked as matches in our training set). When Price2Spy’s manual product matching team checked these combinations manually, they were perfectly good. Alarm bell!!! What is wrong with our process?
The answer was not that obvious, but was rather astounding – ML model has found matches that humans have not! How is that possible? After giving it a lot of thinking, we have come up with several explanations :
OK, this was really great – ML is finding something humans could not find – but how do we distinguish which cases are good matches, and which are not? The answer was tough – the check had to be done manually.
This where our manual matching team came in (thank you, Sandra) – they did the hard work, so we knew which newly-discovered matches were good, so we could have a better picture of real accuracy/sensitivity.
And not only that: the fact that ML discovered some matches that were not in our original training set meant that our original training set was not perfect. In that case – why not add newly discovered matches to the training set, and repeat the process?
This is what we called ‘iterative ML process’. It took us 4 iterations to reach satisfactory results. Fortunately enough, each new iteration was easier than the previous one, as there were fewer and fewer newly-discovered matches found.
For more information please visit the following links: