How to Add and Use Robots.txt in Wordpress

Page content

Robots.txt file for Wordpress

A robots.txt file plays a vital role in helping you manage the crawling of your blog and to some extent a critical role in terms of SEO. While it is not compulsory, having one on your site is good for SEO.

Basically, what a robots.txt file does is to tell the crawlers which pages on the website they can access and which pages must not be crawled. While it helps manage the crawl budget, it also helps avoid overloading the site with crawling requests.

You can allow or disallow specific pages from crawling. However, to block the indexation of specific pages on your website, you must use the no-index rule instead of blocking it using robots.txt. Blocking only disallows crawling, it does not guarantee blocking from indexation since if the link is found on other pages, it might still get indexed. However, there are still certain pages blocking which can be compulsory and can be done using robots.txt.

The robots.txt file can be found in your root folder (the folder in which wordpress is installed). If you have a robots.txt file on your website, you can find it in the root folder. Otherwise, you can create one to help with crawling of your website.

There are two ways to create a robots.txt file for your website. You can use a virtual robots.txt file for your website created by a plugin or a physical robots.txt file that you manually create and upload to the root folder.

Create virtual Robots.txt file with a SEO Plugin:

Generally, the SEO plugins like Yoast and Rank Math include a virtual Robots.txt for your website when you install them on your blog. You can go to the url yourdomain.com/robots.txt to check out.

Both Yoast and Rank Math as well as some other SEO plugins include tools to manage the Robots.txt file. In Yoast SEO plugin, you can use the tools to edit your Robots.txt file and add or remove any rules.

If you are using the Rank Math plugin, you can go to the general settings and edit your Robots.txt from there.

Here are the contents of a basic Robots.txt file used on wordpress.

User-agent: *

Disallow: /wp-admin

Allow: /wp-admin/admin-ajax.php

Sitemap: https://example.com/sitemap.xml

It allows the crawling of all posts and pages except the wp-admin folder with exception of admin-ajax.php.

Additionally, you can include more rules if required. For example, if you want to block the crawling of gif files on your website, just add the following line to your Robots.txt file.

Disallow: /*.gif$

This will block the crawling of the gif files on your blog.

Add Physical Robots.txt to your blog

The second method to add a robots.txt file to your blog is to use a physical robots.txt file. You can easily create a physical robots.txt file using any txt editor and saving it as robots.txt. Then, you can upload it to the root folder through the cpanel or using SFTP.

First open a new file using notepad and then paste the contents of your robots.txt file in it. Now, save the file as robots.txt file. Access your root folder using the cpanel or SFTP and upload the file.

Now, the file will be available at yourwebsite.com/robots.txt. When you want to make changes, just download and edit the file and upload it again.

Otherwise, if you have SSH access to your server, you can create a robots.txt file directly from the terminal. SSH to your server and then change the directory to your root folder.

$ cd /var/www/html

Now, create a new robots.txt file using the following command:

$ sudo nano robots.txt

Paste or enter the contents of your robots.txt file here and your task is done. Now, you have a physical robots.txt file on your blog which can be accessed from the url: example.com/robots.txt

Do not forget to include a link to your sitemap at the bottom in the robots.txt file.

Having a robots.txt file on your blog, (virtual or physical) is good in terms of SEO. However, there are also certain limitations to using the robots.txt file.

Not all crawlers respect the robots.txt rules. So, if you want to prevent access to specific pages or files, you must password protect them.

Since different crawlers interpret the syntax differently, it is important to be mindful of the robots.txt syntax. One syntax error might prevent the indexation of your entire site.

Disallowing a page in robots.txt file will not guarantee exclusion from indexation. Google might find a link to the url on another page and index it. In such a case you can depend on noindex tags to prevent indexation.

Here are some of the common robots.txt rules.

Block the crawling of entire website:

UserAgent: *

Disallow: /

Block directory or directories from crawling:

User-agent: *

Disallow: /privatefolder/

Disallow: /junkfiles/

Disallow: /books/fiction/

Block Access for all crawlers except one

User-agent: Googlebot-news

Allow: /

User-agent: *

Disallow: /

If you want to test your robots.txt file for errors, then go to the robots.txt tester and open the specific property. Subsequently, click on the Test button and check out the result. You can also find out if specific urls are blocked in robots.txt or not.

https://www.google.com/webmasters/tools/robots-testing-tool