What is "near good content"?
We use a summarization algorithm that looks at the content of a page and determines what is useful content when searching. We discard the navigation, header, footer, ads, and any content that has very little text.
We then use the remaining content. When you see "near good content", this means after this extraction there was nothing scored as "good" left, so it falls back to text that scored lower (usually shorter blocks of text, more punctuation, etc).