robots.txt in WordPress
It is IMPORTANT that there is NO robots.txt file in the root of your site! If it is there, then everything described below simply will not work, because your server will serve the content of this static file.
In WordPress, the request /robots.txt
is handled non-standardly. On-the-fly content for the robots.txt file is created (via PHP).
Dynamic creation of content for /robots.txt
allows for convenient modification through the admin panel, hooks, or SEO plugins.
You can modify the content of robots.txt through:
- Hook robots_txt.
- Hook do_robotstxt.
- Plugin https://wordpress.org/plugins/pc-robotstxt/ or similar.
Let's consider both hooks: how they differ and how to use them.
Read also: Configuring robots.txt for WordPress.
robots_txt
By default, WP 5.5 creates the following content for the /robots.txt
page:
User-agent: * Disallow: /wp-admin/ Allow: /wp-admin/admin-ajax.php Sitemap: http://example.com/wp-sitemap.xml
See do_robots() — how dynamic creation of the robots.txt file works.
This hook allows you to add to the existing data of the robots.txt file. The code can be inserted into the theme's functions.php file.
// Add to the base robots.txt // -1 before wp-sitemap.xml add_action( 'robots_txt', 'wp_kama_robots_txt_append', -1 ); function wp_kama_robots_txt_append( $output ){ $str = ' Disallow: /cgi-bin # Standard hosting folder. Disallow: /? # All query parameters on the main page. Disallow: *?s= # Search. Disallow: *&s= # Search. Disallow: /search # Search. Disallow: /author/ # Author archive. Disallow: */embed # All embeddings. Disallow: */page/ # All types of pagination. Disallow: */xmlrpc.php # WordPress API file Disallow: *utm*= # Links with utm tags Disallow: *openstat= # Links with openstat tags '; $str = trim( $str ); $str = preg_replace( '/^[\t ]+(?!#)/mU', '', $str ); $output .= "$str\n"; return $output; }
As a result, when we visit the page /robots.txt
, we see:
User-agent: * Disallow: /wp-admin/ Allow: /wp-admin/admin-ajax.php Disallow: /cgi-bin # Standard hosting folder. Disallow: /? # All query parameters on the main page. Disallow: *?s= # Search. Disallow: *&s= # Search. Disallow: /search # Search. Disallow: /author/ # Author archive. Disallow: */embed # All embeddings. Disallow: */page/ # All types of pagination. Disallow: */xmlrpc.php # WordPress API file Disallow: *utm*= # Links with utm tags Disallow: *openstat= # Links with openstat tags Sitemap: http://example.com/wp-sitemap.xml
Note that we have added to the native WP data, not replaced it.
do_robotstxt
This hook allows you to completely replace the content of the /robots.txt
page.
add_action( 'do_robotstxt', 'wp_kama_robots_txt' ); function wp_kama_robots_txt(){ $lines = [ 'User-agent: *', 'Disallow: /wp-admin/', 'Disallow: /wp-includes/', '', ]; echo implode( "\r\n", $lines ); die; // end PHP execution }
Now, by visiting the link http://site.com/robots.txt, we will see:
User-agent: * Disallow: /wp-admin/ Disallow: /wp-includes/