Fix Indexed Though Blocked by Robots.txt

Page content

How to fix indexed though blocked by robots.txt

If you have seen the warning - indexed though blocked by robots.txt - inside the google search console, it means there are pages on your blog/website that have been indexed despite their crawling being blocked through the robots.txt file.

The robots.txt file contains allow and disallow directives to let search crawlers know which pages they need to crawl and which pages they must not. Sometimes despite being blocked by robots.txt, urls can end up being indexed by Google. It is mainly because while robots.txt directives can prevent the crawling of a page, they cannot prevent the indexation of the page.

To prevent the indexation of a page, you will need to use the noindex meta robots tag on the page. Commonly, the ‘indexed though blocked by robots.txt’ error happens when you have blocked Google from crawling a certain directory but some pages inside the directory are linked internally or externally. If Google crawler comes across a page link on an external source, it may end up indexing the page despite it being blocked on your website using the robots.txt file.

If you want a page out of the Google’s index, the robots.txt file is not the most efficient method to do it. Instead you must apply a no index to the file, url or directory to prevent its indexation by Google.

When you face the indexed though blocked by robots.txt error, you must first make a list of all such urls that have been indexed even after being blocked in the robots.txt file. You can export these urls from the search console and then go through the list to check out if any of these urls were blocked by mistake. Filter the urls that you want indexed and the ones you do not want indexed.

If you want the urls to be indexed (urls flagged by Google as indexed though blocked by robots.txt), then you will need to check out the robots.txt file for rules blocking the crawling of these urls and remove the corresponding disallow rules.

If you do not want any of these urls indexed, then you will need to do two things. First you will need to apply the noindex meta tag to these urls, which can be done using a seo plugin in Wordpress like Yoast, Rank Math or AIOSEO. After you have applied the noindex tag to these urls, you will need to open these urls for crawling. It is an essential step since if Google cannot crawl these urls, it would not know that these urls have been set to noindex.

Fix Discovered - currently not indexed

Therefore, to ensure that Google can see the noindex tag, you must remove the disallow rule from the robots.txt file preventing the crawling of those urls. Wait till Google again crawls the urls and then some days or weeks later, you will see that these urls are being removed from Google index. It will take Google some time to crawl and deindex those urls but they will definitely get deindexed if you have applied noindex tag and allowed Google to crawl them.

Sometimes, it may have happened by mistake that a wrong directive in the robots.txt file is preventing the files or urls from being crawled. In such a case, removing the directive will allow Google to crawl those urls again and they will be indexed.

However, in other cases, these urls do not belong in Google index and need to be removed quickly. If that is the case, you can use the url removal tool by Google to quickly remove these urls from Google index. However, after you have used the url removal tool which you can also use to remove urls in bulk, if they are in the same archive or directory; you must apply the noindex tag to these urls or files and allow Google to crawl them again.

Fix Crawled - Currently not indexed

Sometimes, you might end up preventing the crawling of the entire website by mistakenly leaving the box for discouraging search engines from indexing this website ticked inside Wordpress settings. You can change it by going to Wordpress settings. Just go to settings -> reading and you will find the box at the bottom. Untick the box and it will allow your website to be crawled and indexed.

In another scenario, you might have both types of urls among those flagged by Google - the ones that need to be indexed and ones that do not need to be. However, in either case, if these urls have been indexed, you need to remove the disallow directive preventing the crawling of these urls and apply noindex tag to the ones you want to be removed from Google’s index.

In some cases, you might need to prevent Google from showing certain pages in search results and it is fine to use a disallow directive to prevent Google from crawling these pages. However, if Google finds some of these pages through external or internal links and adds them to its index, you can still remove them using password protection. Password protecting urls can also prevent their indexation by Google.

How to prevent pages from getting indexed

Sometimes, you may apply a disallow directive to a group of urls to manage your crawl budget and prevent Google from crawling certain useless or duplicate pages. However, Google still ended up indexing some of these urls since it found some external or internal links to these pages. Then you must apply the fix that is to add the noindex tag and open these pages to crawling by Google to get them deindexed faster. Remove any internal links to these pages and if possible, you can entirely remove such pages so that they become 404s.

You can remove urls from Google’s index quickly using the url removal tool. However, you must use the tool sparingly and only in case of an emergency.

Remember that robots.txt is not the best method to prevent content from getting indexed. You should instead use the noindex meta tag to keep content out of Google index. Do not use the robots.txt disallow directive and noindex tag together since if Google cannot crawl a page, it cannot see the noindex tag.

Suggested Reading

How to limit revisions in Wordpress

On page seo tips to boost website rankings

How to fix Wordpress posts returning 404

Noindex meta tags vs the disallow directive

Host Google Fonts Locally in Wordpress