Website crawl / scrape: how it works, benefits and use cases

Best practices in price monitoring 2.7.2019. Reading Time: 4 minutes

Online retailing has been growing each year more and more. For now, there are no indications of this trend stopping any time soon. Because of this, the number of companies that are trying their luck in e-commerce business is bigger than ever before. Retailers, brands, distributors, everyone is doing business online. This creates pressure for companies to know what their competitors are doing at every moment. New products are added every day, offers change every day and it’s hard to keep track on everything.

The number of companies selling online grew only because the number of people willing to shop online became bigger. The Internet and technology allow them to shop with no limits (a customer from Europe can buy something from Asia one day and have it delivered in the next couple of days). Because they have so many options, companies need to be aware of their behavior, what they like and don’t like, as well as what kind of reviews they’re leaving.

The only way companies can keep up with competitors and customers is to do obtain as much data possible. One of the best ways for them to do that is site crawl/scrape.


What is site crawl / scrape?

Website crawl or scrape is the process of extracting content and data from a website. This process allows companies to obtain all, publicly available information from any website. A company can try and make their own software for it, but it’s expensive and it takes a lot of time and resources. Not to mention that there are websites that make crawling them almost impossible. That is why it makes more sense to use a tool that was developed and ready to use by others like Price2Spy.

Price2Spy is a price monitoring, price comparison, and repricing tool used by from small family businesses to big international corporations, across the world. Over the years, Price2Spy team has mastered a crawl process that could help companies gather any valuable data in bulk. The technical superiority of Price2Spy, allows you to perform crawl/scrape on any kind of website, regardless of their complexity. For example:

  • Websites that have very complex page navigation structure;
  • Websites that have complex JavaScript menu and/or paging implementation;
  • Websites that have strong anti-bot protection (sites that do not want to be crawled, e.g. Amazon);
  • Websites requiring browser interaction before scraping data;
  • Websites having huge amounts of products;
  • Websites that have multiple product variations shown on the same product page.

Even the location of the website isn’t an obstacle. Price2Spy can crawl/scrape websites that show different prices and information for different countries. It can also crawl entire websites or only specific product categories/brands.

What kind of data can you get?

Companies can get any data that they need from a competitor’s website. Some of the things that they would be able to get from a crawl/scrape are:

  • product name,
  • product URL,
  • product description,
  • product category,
  • product price (list/sale price),
  • brand information,
  • stock levels,
  • manufacturer part number (MPN),
  • product image, etc.

The list doesn’t end here. With crawl/scrape, it is possible to get contact information, reviews, any data that is publicly available. It’s also possible to capture data fields which are not shown on the product page itself (for example fields shown on the category page, shown before reaching the product page).

Use Cases

While Price2Spy team was performing crawls for different clients, it came to a conclusion that it can be roughly grouped into use cases for online retailers and for brands/distributors.

  • For Online Retailers

When it comes to online retailers crawl can firstly be performed in order to capture complete competitor’s assortment. The output of this process can be used as a data source for adding new products to the retailer’s own store. The second use case is when capturing deltas in the competitor’s assortment. With this, retailers will get two important types of information: which products have been added to the competitor’s website and which ones have been discontinued from the said site. When retailers have this information in front of them, they’re able to create a better and competitive offer for their customers. One thing to mention that capturing delta’s needs periodical recrawls since it’s not possible to do it without repeating the process.

  • For Brands / Distributors

Brands and distributors use crawls to find out which new products were released by their competitors, just like retailers. But it’s very common for them to use crawl in order to capture product reviews from retail websites. They do this in order to determine consumer sentiment towards the product.  If they find out that customers aren’t fans of some products, they’ll won’t be making/selling it.

Site crawl/scrape provides valuable data to companies, no matter if they’re an online retailer or brand or distributor.  It’s becoming an essential part of e-commerce businesses in gaining insight that will help companies develop good strategies. With it, they’ll be able to create better offers, be more competitive, understand the market and most importantly make better business decisions. Although crawl/scrape is a complex process, it’s easy when you do it with the right tool.

Have you ever used a tool for website crawl? Share your thoughts with us down in the comments.