Answer:

This can occur when the crawler has not re-visited the webpage to classify it as a 404/403 and remove it.

Solution:

Webpages that respond with a 404/403 are removed from a collections index when any one of the following occur:

  1. The webpage fires our instant indexing tracker. See how to set up and conditions that apply to instant indexing.

  2. The crawler removes it when it automatically visits the webpage on its next scheduled crawl cycle.

  3. The webpage is manually removed from the index using the Diagnose tool in the crawl statuses section of the Crawler page.

  4. A manual re-crawl of the entire site occurs via the “Re-crawl all pages” button in the crawl statuses subpage of the Crawler page. Note as this will re-crawl the entire site it may take some time before the 404 is removed.

  5. Our API Delete endpoint is manually called to remove the webpage from the index.