Prevent Indexation of Specific Pages on Your Website

Page content

How to prevent specific pages on your website from being indexed by Google

If you are running a blog, you would want Google to crawl and index the post urls. Apart from the home and post urls as well as some static pages like about us and contact page, there are very few that you would want getting indexed and produced in the search results. You can allow Google to crawl all the posts and pages, but getting everything on your website indexed is not essential since while it may cause duplicate content issues, it might also cause your own urls to compete against each other for the same keyword. Sometimes, users find too many urls with parameters getting indexed which might cause index bloat. In that case too you need to prevent crawling or indexation of such urls using one means or another.

For example, you might want to noindex feeds or author archives or date archives since these urls will also lead to the same urls that are indexed via the sitemap. Your sitemap mainly includes your post urls, home page, blog page and a few static pages unless you have selected to also include category archives or image urls in the sitemap. Your first priority are the post urls apart from the home and blog url and rest things can be marked as no index.

If you are using Wordpress, applying the noindex tag becomes easier because of the dedicated SEO plugins which can help you to easily apply noindex to archives, and other pages. If you want to noindex a post url, you can do that also with the help of a SEO plugin like Yoast, RankMath, AIOSEO or another.

Please note that marking urls as no index is considered a better method than preventing the crawling of those urls through the robots.txt to keep them out of Google’s index. Robots.txt can block the crawling of urls. However, if there are external pages linking to those urls, they might still get indexed unless they have been marked as noindex, follow. Robots.txt can be effective when you want Google to not crawl certain pages like pdf files or specific folders or subfolders. Otherwise, your best option to keep urls from getting indexed is to use the noindex meta tag.

   <meta name="robots" content="noindex, follow">

This tag can be added to the html page inside the and tags to prevent pages from being indexed. Just add this meta robots tag to the specific page inside the head.

The meta robots tag is a highly helpful tag that can be used to manage how your pages appear in search results. You can use it to set a page to noindex, nofollow or nosnippet and so on. If you want Google to not index but crawl the page, you can set it to noindex, follow. Setting it to noindex, nofollow will reduce/prevent Google’s crawling of the page also. Adding a nosnippet tag means Google must not present a snippet from the page in search results.

Applying these tags is important to ensure that your important content gets indexed and unimportant content stays out of Google index. If you are using Wordpress, you can apply a meta robots tag without any coding. You can use a dedicated SEO plugin like Yoast, AIOSEP or Rank Math.

How to no index pages with Yoast SEO

Yoast is the most popular SEO plugin in the Wordpress repository used and liked by millions. You can use Yoast SEO to mark taxonomies and archive pages as no index. Yoast makes it very easy to mark archives or individual pages as no index. Install the plugin on your wordpress blog and start setting.

Yoast has made it very simple to set individual pages or categories and tags to no index. You can also set author archives, date archives or format archives to no index using Yoast easily.

For example, you want to set the categories and tags pages on your website to no index. Go to Yoast seo settings and then toggle the switch to show categories and tags in search results to off. This will mark those pages as noindex.

You can do the same to the author archive, date archives and format archives.

However, the settings to mark individual posts or pages as noindex are included inside the post editor. You can make changes when editing individual posts.

No Index Pages with Rank Math

Rank Math also provides simple options to mark categories and tags as no index. In the Rank Math settings, go to titles and meta and then select categories. You can toggle on the Robots meta for categories to set it as per your preference.

Following that, you can check the box for no index or no follow as you need. You can follow the same process with tags as well. Following that save the changes and you are done.

You can also mark individual posts and pages as no index from the posts and page editor. Go to Rank Math settings inside the post editor on the right and from there you can check the box for noindex and/or nofollow.

No Index Pages with AIOSEO

AIOSEO is another popular and effective seo plugin that you can use to mark content as no index. For example, if you want taxonomies to be noundex, you can go to the advanced settings for taxonomies where you will find the robots meta settings and you can tick the boxes according to your choice to mark category and tag pages as noindex.

Similarly, if you want a specific post as noindex, you can go to the post editor from where you can go to the bottom. There you will find the settings to mark that post or page as noindex. Just check the appropriate boxes and then click on update.

No Index Pages with SEO Framework

The SEO Framework is also a popular SEO plugin that can be used to mark specific pages as noindex. You can find all the settings to mark posts, pages, taxonomies, author archives and other archives under the Robots meta settings of SEO framework. Just check the boxes you need.

Prevent the crawling and indexing with Robots.txt

The Robots.txt page can prevent the crawling and indexing of specific pages and especially the ones that cannot be marked as noindex using a plugin. For example, if you have pdf pages for download on your website that you do not want to be indexed, you can easily prevent their crawling and indexing by Google. You can also use the robotx.txt file to prevent the crawling of specific parameters using a disallow rule. For example, if you have to prevent the crawling of pdf files on your website, you can do that in the following manner.

Disallow: *.pdf

Prevent crawling and indexing of specific files with X Robots meta tag

If you want to prevent the crawling of non html files, the X Robots tag is also an effective method. However, it is a slightly more complicated process than the Robots.txt file. You will need to make server level changes or changes to your .htaccess file. If you are using an apache server, you can easily mark pages as no index using the X Robots tag. Just paste the following into the .htaccess file:

<Files ~ "\.pdf$">
Header set X-Robots-Tag "noindex, follow"
</Files>

Conclusion:

In several instances, you may need to mark a page as noindex to prevent its indexation by Google. It is also good to hide certain parts of your website from the googlebot to manage crawl budget more effectively. However, noindex tag is the most effective way to prevent indexation of specific pages and it can be done by adding the meta tag to the header or by using a wordpress plugin to do it. There are also certain pages like pdf pages that cannot be marked as noindex and therefore, you should use a X Robots meta tag or the robots.txt file to prevent their crawling and indexation. Remember that you can use the Robots.txt file to prevent the crawling of pages but this will not ensure these pages are not indexed since pages linked from external sources may still get indexed. So, the noindex meta robots tag is the better option for html pages.

Suggested Reading:

Noindex Vs Disallow

Disable Author archives and RSS Feeds in Wordpress

Configure Bunny CDN for Wordpress

Remove Proudly Powered by Wordpress

Enable headers Module Apache Server

Google’s Policies Regarding Spam

Install NGINX on Ubuntu 22.04