How can I fix PDFs and DOCs that fail to index or have the wrong title or description?

Problem

A few PDF and DOC files are not added to your collection, and a few of them have the wrong title or description.

Solution

The first thing to do is check how the crawler views your document. Do this by adding the URL of the document to the debug page.
If the debug page shows that the page is indexed correctly, then go to the crawl statuses section of the Crawler page in the console and use “Diagnose” to view the current crawl status of the page. You will be prompted with an error if the file size exceeds 10mb. If status is no-index or redirect, then it means that there are rules in the collection or a no-index meta tag in that document due to which we cannot crawl your document.
If the debug page shows an error and mentions that it can't download the document, then it's likely a corrupt file. Some systems may still be able to open the file, but not all. We recommend re-saving or exporting with a different program or version.

Regarding the documents that have a wrong title or description, we take the title and description from the metadata of the document. With DOC files we use the Subject metadata field as a description.

If no title is present, then we use the filename instead. If no description is present, then we use some of the text from within the document.

You can do the following to update the title:

Update either the metadata or the filename and upload the file to your CMS/website
Once added, we will index the PDF on the next crawl cycle. If you want the change to reflect immediately, then re-index the URL of the PDF document via our Diagnose tool in the crawl statuses section of the Crawler page.

Documentation

Can I index Doc/Docx/PDFs?

Problem

Solution

Documentation

Related articles