How do I see when a webpage was last crawled and its crawl status?
Within the Domains menu click the Diagnose button and paste in the url to your webpage. This will tell you the last date the crawler attempted to crawl the webpage and the current crawl status.
The crawl states are:
PENDING - The page is in a queue to be visited by the crawler
INDEXED - When the crawler last visited this page it was added it to the index
NOINDEX - There is an explicit reason preventing this page from being indexed. It has either:
Triggered an exclusion Rule in the Rules menu
The crawler encountered a
noindex
ordata-sj-noindex
attribute in the pages HTML
ERROR - When the crawler last visited the page it encountered an error. The error message should be displayed in the Diagnose menu. You can use the Diagnose menu to prompt the crawler to try to index the page again to see if the error persists
REDIRECT - We will not index this page. If it was previously in the index it will be removed. The crawler will visit the re-directed page and assign a crawl state to it
Recrawling according to crawl state:
Only web pages with the INDEXED state will be automatically visited by the crawler every 6-7 days
PENDING
Web pages with this state should not need re-crawling as it only lasts up to a few hours before a new state is assigned. If you see web pages that have this state for longer then please contact us at support@search.io
INDEXED
These web pages will be revisited by the crawler automatically every 6-7 days
You can also use our instant indexing to automatically re-crawl as content is published or visit the Domains screen and use the ‘Re-index all domains' or 'Diagnose’ feature to add web pages manually to the current crawler queue
NOINDEX
If NOINDEX was assigned due to a Rule then simply removing the rule will add the associated web pages to the crawler queue
If NOINDEX was assigned due to encountering
noindex
ordata-sj-noindex
in the HTML then these need to be removed from the HTML, and then you must use the Diagnose tool in the Domains screen to re-index the URL of each web page individually. If you have many web pages then please contact us at support@search.io
ERROR and REDIRECT
The only way to re-crawl web pages with an ERROR or REDIRECT state is to use the Diagnose tool in the Domains screen to re-index the URL of each web page individually. If you have many web pages then please contact us at support@search.io