Webmasters and marketers know how important the indexing of a site by search engines is. This is why they do everything they can to help search engines like Google and Yandex crawl and index their sites correctly.
A lot of time and resources are spent on internal and external optimization such as content, links, tags, image optimization and site structure.
All this plays a huge role in promotion. However, if you forgot to do the technical optimization of the site, if you have not heard about the robots.txt and sitemap.xml files, there may be problems with the correct crawling and indexing of your site.
In this article, I will explain how to properly configure and use a robots.txt file and a robots meta tag. So, let’s begin!
What robots.txt file is?
Robots.txt – it is a text file that is used as a guide for search engine robots (also known as crawlers, bots, or spiders) how to crawl and index pages on a site.
In simple terms, robots.txt tells robots which pages or site files we want to see in search and which we don’t.
Ideally, the robots.txt file is placed in the root directory of your website (https://site.com/robots.txt) so that robots can immediately access its instructions.
If you are using CMS WordPress, then you will be able to see your file at the above address, however you will not find the file itself in the shared folder with your site. This is because WordPress automatically creates a virtual robots.txt file (with default parameters) if it does not find the file in the site’s root directory.
The virtual robots.txt file of the CMS WordPress does not solve all the necessary tasks, so it is highly desirable to write your own.
к содержанию ↑What is robots.txt for?
The robots.txt file is needed to prevent search robots from visiting certain sections of your site, for example:
- pagination pages;
- pages with search results on the site;
- administrative files;
- service pages;
- links with utm tags;
- data on sorting, filtering, comparison parameters;
- personal account page, etc.
Important! The robots.txt file is optional for search robots. In this regard, if you want to be 100% sure that any of the pages of your site will not appear in the search results, use the robots meta tag.
According to Google Help the robots.txt file is not intended to prevent web pages from showing in Google search results.
If you do not want any page of your site to appear in the search, insert into page <head> attribute noindex:
<meta name=“robots” content=“noindex,nofollow”>к содержанию ↑
How to edit robots.txt
There are two ways to edit your robots.txt file in WordPress CMS. Add the required code to the functions.php file, or using a plugin.
In our agency, we prefer the second way.
Install the plugin Virtual Robots.txt from the CMS WordPress repository, open it in the admin. panel in the Settings tab. In the plugin field that opens, enter the required code, press the Save button and voilà – your robots.txt file is ready.
Proper robots.txt for WordPress CMS
User-agent: * # general rules for robots of all search engines Disallow: /cgi-bin # service folder for storing server scripts Disallow: /? # all request parameters on the main Disallow: /wp- # all files WP: /wp-json/, /wp-includes, /wp-content/plugins Disallow: /wp/ # if there is a subdirectory / wp / where the CMS is installed (if not, # rule can be removed) Disallow: *?s= # site search Disallow: *&s= # site search Disallow: /search/ # site search Disallow: /author/ # author's archive Disallow: /users/ # user archive Disallow: */trackback # trackbacks, notifications in comments about a link to a web document Disallow: */feed # all feeds Disallow: */rss # rss feed Disallow: */embed # all embeddings Disallow: */wlwmanifest.xml # xml Windows Live Writer manifest file (if not using, # rule can be removed) Disallow: /xmlrpc.php # WordPress API file Disallow: *utm*= # utm Disallow: *openstat= # openstat tags Allow: */uploads # open the folder with files uploads Allow: /*/*.js # open js files Allow: /*/*.css # open css files Allow: /wp-*.png # allow indexing images Allow: /wp-*.jpg # allow indexing images Allow: /wp-*.jpeg # allow indexing images Allow: /wp-*.gif # allow indexing gifs Allow: /wp-admin/admin-ajax.php # allow ajax # Specify one or more Sitemap.xml files. Google XML Sitemap Plugin automatically creates 2 sitemaps like in the example below. Sitemap: http://yoursite./sitemap.xml Sitemap: http://yoursite.ru/sitemap.xml.gzк содержанию ↑
How to check robots.txt file?
If the robots.txt file is not configured correctly, it can lead to multiple errors in the indexing of the site. You can check the correctness of your robots.txt settings using a free tool Google Robots Testing Tool
Choosing your site:
As a result, there should be no errors or warnings and the file should be Accessible for robots:
If your robots.txt file is configured correctly, it will significantly speed up the indexing process of your site.