Problem

There are no records or records missing from your collections index.

Investigation steps for no records:

  1. Use our page debug tool to see if our crawler encounters an error on your homepage

  2. Has the domain been added correctly to the Domains section of our platform?

  3. If the entered domain has a www prefix the crawler will not visit any subdomains. See here

  4. Is the site password protected? See here

  5. Are your servers blocking our crawler? See here

  6. Our crawler will only crawl pages under 10mb in file size. If your webpages exceed this our page debug tool will throw the following error error cleaning HTML: content too large

Investigation steps for records missing from your collection:

  1. Use our page debug tool to see if our crawler encounters an error on any of the URL’s that are missing

  2. Does your HTML contain absolute or relative URLs? The crawler will only follow relative URLs if there is a <base> tag on the page <base href="https://www.yourwebsite.com/">

  3. If the content was added / altered recently our crawler may not have visited your site according to its schedule. See here

  4. If you’re using our ping-back / instant indexing code have you met the conditions documented here?

  5. Do you have any exclusion rules set up in the Crawler rules section preventing the pages from being crawled?

  6. Do your pages have the noindex or data-sj-noindex tags preventing crawling?

  7. Is any of your content only accessible via query parameters i.e. www.mysite/news-articles/press-releases?page=2? If so please contact our support team here.

  8. Do the pages you expect to be indexed have canonical tags pointing our crawler to a different page?

  9. Do you have any canonical loops i.e. homepage redirects to a different page, which redirects back to the homepage

  10. Do you have a sitemap? This page describes the benefits

  11. Our crawler will only only index sitemaps named sitemap.xml

  12. Our crawler will only crawl pages under 10mb in file size. If your webpages exceed this our page debug tool will throw the following error error cleaning HTML: content too large