How do I see when a webpage was last crawled and its crawl status?
Navigate to the Crawler crawl statuses section of the console and enter the URL of the webpage into the textbox beneath the page heading. Clicking “Diagnose” will open a modal that will tell you the crawl state, and the last time the crawler attempted to crawl the webpage.
Crawl states
The crawl states are:
PENDING - The page is in a queue to be visited by the crawler.
INDEXED - When the crawler last visited this page it was added it to the index.
NOINDEX - There is an explicit reason preventing this page from being indexed. It has either:
Triggered an exclusion Rule in the Crawler rules menu.
The crawler encountered a
noindex
ordata-sj-noindex
attribute in the pages HTML.
ERROR - When the crawler last visited the page it encountered an error. The error message should be displayed in the Diagnose modal. You can use the Diagnose tool to prompt the crawler to try to index the page again to see if the error persists.
REDIRECT - We will not index this page. If it was previously in the index, it will be removed. The crawler will visit the re-directed page and assign a crawl state to it.
Re-crawling webpages
The method to re-crawl varies depending on a page’s crawl state:
Only webpages with the INDEXED state will be automatically visited by the crawler every 6-7 days
PENDING
Webpages with this state should not need re-crawling as it only lasts up to a few hours before a new state is assigned. If you see webpages that have this state for longer, please contact us at support@search.io.
INDEXED
These webpages will be revisited by the crawler automatically every 6-7 days.
You can also use our instant indexing to automatically re-crawl as content is published or use the ‘Re-crawl all pages' or 'Diagnose’ features on the Crawler crawl statuses page to add webpages manually to the current crawler queue.
NOINDEX
If NOINDEX was assigned due to a Rule then simply removing the rule will add the associated webpages to the crawler queue.
If NOINDEX was assigned due to encountering
noindex
ordata-sj-noindex
in the HTML then these need to be removed from the HTML, and then you must use the Diagnose tool on the Crawler crawl statuses page to re-crawl the URL of each webpage individually. If you have many webpages then please contact us at support@search.io.
ERROR and REDIRECT
The only way to re-crawl webpages with an ERROR or a REDIRECT state is to use the Diagnose tool on the Crawler crawl statuses page to re-crawl the URL of each webpage individually. If you have many webpages then please contact us at support@search.io.