robots.txt in WordPress

It is IMPORTANT that there is NO robots.txt file in the root of your site! If it is there, then everything described below simply will not work, because your server will serve the content of this static file.

In WordPress, the request /robots.txt is handled non-standardly. On-the-fly content for the robots.txt file is created (via PHP).

Dynamic creation of content for /robots.txt allows for convenient modification through the admin panel, hooks, or SEO plugins.

You can modify the content of robots.txt through:

Let's consider both hooks: how they differ and how to use them.

robots_txt

By default, WP 5.5 creates the following content for the /robots.txt page:

User-agent: *
Disallow: /wp-admin/
Allow: /wp-admin/admin-ajax.php

Sitemap: http://example.com/wp-sitemap.xml

See do_robots() — how dynamic creation of the robots.txt file works.

This hook allows you to add to the existing data of the robots.txt file. The code can be inserted into the theme's functions.php file.

// Add to the base robots.txt
// -1 before wp-sitemap.xml
add_action( 'robots_txt', 'wp_kama_robots_txt_append', -1 );

function wp_kama_robots_txt_append( $output ){

	$str = '
	Disallow: /cgi-bin             # Standard hosting folder.
	Disallow: /?                   # All query parameters on the main page.
	Disallow: *?s=                 # Search.
	Disallow: *&s=                 # Search.
	Disallow: /search              # Search.
	Disallow: /author/             # Author archive.
	Disallow: */embed              # All embeddings.
	Disallow: */page/              # All types of pagination.
	Disallow: */xmlrpc.php         # WordPress API file
	Disallow: *utm*=               # Links with utm tags
	Disallow: *openstat=           # Links with openstat tags
	';

	$str = trim( $str );
	$str = preg_replace( '/^[\t ]+(?!#)/mU', '', $str );
	$output .= "$str\n";

	return $output;
}

As a result, when we visit the page /robots.txt, we see:

User-agent: *
Disallow: /wp-admin/
Allow: /wp-admin/admin-ajax.php
Disallow: /cgi-bin             # Standard hosting folder.
Disallow: /?                   # All query parameters on the main page.
Disallow: *?s=                 # Search.
Disallow: *&s=                 # Search.
Disallow: /search              # Search.
Disallow: /author/             # Author archive.
Disallow: */embed              # All embeddings.
Disallow: */page/              # All types of pagination.
Disallow: */xmlrpc.php         # WordPress API file
Disallow: *utm*=               # Links with utm tags
Disallow: *openstat=           # Links with openstat tags

Sitemap: http://example.com/wp-sitemap.xml

Note that we have added to the native WP data, not replaced it.

do_robotstxt

This hook allows you to completely replace the content of the /robots.txt page.

add_action( 'do_robotstxt', 'wp_kama_robots_txt' );

function wp_kama_robots_txt(){

	$lines = [
		'User-agent: *',
		'Disallow: /wp-admin/',
		'Disallow: /wp-includes/',
		'',
	];

	echo implode( "\r\n", $lines );

	die; // end PHP execution
}

Now, by visiting the link http://site.com/robots.txt, we will see:

User-agent: *
Disallow: /wp-admin/
Disallow: /wp-includes/