Problem

A few PDF and DOC files are not added to your collection, and a few of them have the wrong title or description.

Solution

  1. The first thing to do is check how the crawler views your document. Do this by adding the URL of the document to the debug page.

  2. If the debug page shows that the page is indexed correctly, then go to the Domains section of the console and use “Diagnose” to see the current crawl status of the page. You will be prompted with an error if the file size exceeds 10mb. If status is no-index or redirect, then it means that there are rules in the collection or a no-index tag in that document due to which we cannot crawl your document.

  3. If the debug page shows an error and mentions that it can't download the document, then it's likely a corrupt file. Some systems may still be able to open the file, but not all. We recommend re-saving or exporting with a different program or version.

Regarding the documents that have a wrong title or description, we take the title and description from the metadata of the document. With DOC files we use the Subject metadata field as a description.

If no title is present, then we use the filename instead. If no description is present, then we use some of the text from within the document.

You can do the following to update the title:

  1. Update either the metadata or the filename and upload the file to your CMS/website

  2. Once added, we will index the PDF on the next crawl cycle. If you want the change to reflect immediately, then re-index the URL of the PDF document via our Diagnose tool in the Domains section.


Documentation

Can I index Doc/Docx/PDFs?