How an SEO Resolved a Strange Crawled Currently Not Indexed Problem

Google canonicalized a feed over a web page, resulting in a Crawled Currently Not Indexed error.

seoadmin

5 years ago

How an SEO Resolved a Strange Crawled Currently Not Indexed Problem

A technical SEO wrote a case study about how he solved a strange Crawled Currently Not Indexed issue on his site. While the solution he discovered may not be applicable to others experiencing the same issue, his method for identifying and resolving the issue provides a useful walkthrough for resolving technical SEO issues.

What happened to his site indexing was extremely strange. However, his solution was simple and logical.

Crawled – Not Currently Indexed

There have been numerous anecdotal reports of Crawled Currently Not Indexed on Facebook, Twitter, and even John Mueller’s Office-hours hangouts.

Someone asked in a recent Office-hours hangout why Google Search Console (GSC) showed Crawled Not Indexed but when you click through they are indexed. John Mueller responded that it is simply a lag between reports.

In another Office-hours hangout, John Mueller mentioned that it’s completely normal for a site to have many pages that aren’t indexed.

He noted:

“…if you have a smaller site and you’re seeing a significant part of your pages are not being indexed, then I would take a step back and try to reconsider the overall quality of the website and not focus so much on technical issues for those pages.
The other thing to keep in mind with regards to indexing, is it’s completely normal that we don’t index everything off of the website.
And over time, when you get to like 200 pages on your website and we index 180 of them, then that percentage gets a little bit smaller.”

While both of these are plausible explanations for why the Crawled Not Indexed error occurs in some cases, they are not the reason Adam Gent discovered it.

Read How to Make Your Business Stand Out.

Adam Gent discovered a completely different issue that appeared to be an algorithm problem at Google. The issue wasn’t with the site itself; it was with Google’s indexing.

Why Crawled – Not Currently Indexed

Adam examined the GSC Index Coverage report and discovered that Google was crawling and indexing his feeds in the same way that HTML pages were.

He took random words from those pages and used them in a site: search to discover that the feed page content was indeed indexed.

To make matters worse, Google appears to have canonicalized the RSS feed content over the actual web page, which explains why the real web pages were crawled but not indexed.

WordPress generated the RSS feed.

The feed page appears to be rendered as a web page rather than as an XML file, which is unusual in this case.

I could be mistaken, but that does not appear to be a standard RSS feed. It appears to be an HTML page.

Although the underlying code is XML, most feeds do not look like this.

Could this have influenced Google’s decision to canonicalize the feed?

It’s difficult to understand how that could happen because there are so many signals, such as internal linking, that would normally cause Google to favor HTML pages as canonical.

How Adam Resolved the Issue

After determining what had occurred, Adam deleted the WordPress generated feed pages, submitted the feed URLs for a crawl, and then 404’d the pages.

After those pages were removed from the index, he submitted the correct URLs to Google, and the problem was resolved within a few days.

What caused the issue?

According to Adam, the issue appears to be on Google’s end.

I asked around, and someone told me that Google apparently began indexing feeds a few years ago, but that he thought the problem had been resolved.

I’m not an XML expert, but it appears unusual that the feed resembles an HTML page rather than the standard XML layout that appears without HTML styling.

The feed does not appear normal, so whatever is causing it to appear that way appears to be an underlying cause.

Regardless, if you’re experiencing Crawled Currently Not Indexed issues, this is another thing to look into in case it’s also happening to you.

Need help with our free SEO tools? Try our free Robots.txt Generator, Get Source Code of Webpage, Domain into IP.

Read Do Google Lighthouse Scores Have an Impact on SEO?