Sitemap in WordPress

WordPress Version 5.5 has finally added support for sitemaps to the core. The Google XML Sitemaps plugin and the like are no longer needed.

Introduction

The WordPress Sitemap home page is located at /wp-sitemap.xml or /sitemap-xml (with redirect). It contains links to sitemaps.

And each sitemap contains links to the site pages (posts).

What is included in the Sitemap by default?

Only public post types, public taxonomies and author posts pages. Is it public type is determined by public and publicly_queryable parameters on post type or taxonomy registration.

Maximum number of links in the Sitemap

Main (index) page can contain maximum 50 000 sitemaps (links to sitemaps). This value canNOT be changed, it is located in a private property WP_Sitemaps_Index::$max_sitemaps.

Every sitemap can contain maximum 2 000 links on one page. This value can be changed though wp_sitemaps_max_urls hook:

# Change the maximum number of links in each sitemap.
add_filter( 'wp_sitemaps_max_urls', 'kama_sitemap_max_urls', 10, 2 );

function kama_sitemap_max_urls( $num, $object_type ){
	// $object_type can be one of: post, term, user
	return 1000;
}

The maximum number of single sitemap links affects the number of pagination links on the main sitemap page. For example, if we have 500 posts, then by default there will be only one sitemap for all posts. But if we change the maximum to 100, then there will be 5 links to the single sitemaps on the the main sitemap page.

Sitemap doesn't work, 501 error?

The WP XML Sitemap is powered by PHP extension SimpleXML. If you do not have it on your server, the sitemap won't work and you will see 501 error on the sitemap page.

Link to the sitemap in the robots.txt file

The link https://domain/wp-sitemap.xml is automatically added at the end of robots.txt file. But it happens only if your robots.txt file is dynamic (there is no physical robots.txt file in the root directory). For more information about the dynamic robots file, see the function do_robots().

Example of creation correct dynamic robots.txt file for WordPress see by this link.

Disabling WordPress Sitemap

If there is a plugin or custom code which already creates sitemap and WordPress sitemap is no needed, then you can disable it. To do this place the following code into the theme's functions.php file or somewhere else:

add_filter( 'wp_sitemaps_enabled', '__return_false' );

Now if you go to the page /wp-sitemap.xml you will see 404 error page.

Note: such disabling din't deletes rewrite rules for the sitemaps, because they are important for the right responses when the sitemap is disabled see more here.

Pay attention, that the WordPress sitemap is disabled automatically if checkbox "Discourage search engines from indexing this site" is enabled on the "Reading" options page:

Adding Elements to the Sitemap

To add post type or taxonomy into the WordPress Sitemap, we need to make them public. To do this, to do this, you need to set the public and publicly_queryable parameters to true during registration.

To add custom links to the sitemap, you need to create your own Provider.

Deleting Sitemap elements

By default there is three providers of the Sitemaps for different types: posts, taxonomies, users:

You can delete the whole provider of sitemaps or a single type (a single sitemap "inside" the provider) or even a single type element (a link "inside" the map). For example let's take taxonomies. You can disable a "taxonomies" provider and then all taxonomies will disappear from the sitemap, you can disable a single taxonomy and leave the rest of the taxonomies, or you can exclude a category (taxonomies element) from the list of links.

Disabling Entire Provider (all taxonomies, users, post types)

Provider is a general term that includes all types. For example

  • A post type provider includes all types of posts (pages, posts).
  • A taxonomy provider includes all taxonomy types (category, tags).

If you disable a whole provider, all types will be removed from the site map, e.g. if you disable a taxonomy provider, all taxonomy types (categories, tags, user archives) will be excluded from the site map.

# Disabling the sitemap provider: users and taxonomies
add_filter( 'wp_sitemaps_add_provider', 'kama_remove_sitemap_provider', 10, 2 );

function kama_remove_sitemap_provider( $provider, $name ){

	$remove_providers = [ 'users', 'taxonomies' ];

	// disabling users archives
	if( in_array( $name, $remove_providers ) ){
		return false;
	}

	return $provider;
}

Disable Post Type (post, page)

For example, we don't need the page post type in the sitemap.

# Remove post type from the Sitemap
add_filter( 'wp_sitemaps_post_types', 'wpkama_remove_sitemaps_post_types' );

function wpkama_remove_sitemaps_post_types( $post_types ){

	unset( $post_types['page'] );

	return $post_types;
}

Remove Taxonomy (category, tags)

For example we don't need tags - post_tag taxonomy in the sitemap.

# Remove taxonomies from the sitemap
add_filter( 'wp_sitemaps_taxonomies', 'wpkama_remove_sitemaps_taxonomies' );

function wpkama_remove_sitemaps_taxonomies( $taxonomies ){

	unset( $taxonomies['post_tag'] );

	return $taxonomies;
}

Exclude Separate URLs

Single elements are excluded by changing the query WP_Query, WP_Term_Query parameters through special hooks.

Note: If, after excluding individual elements, there are no elements left in the sitemap at all for this type or provider, then the type or provider will be completely excluded from the site map.

Exclude single Posts from the Sutemap

For example, we need to exclude posts with ID 12 and 24 from our sitemap (suppose they have the noindex meta tag, but appears in the sitemap).

See hook wp_sitemaps_posts_query_args and all parameters of WP_Query().

add_filter( 'wp_sitemaps_posts_query_args', 'kama_sitemaps_posts_query_args', 10, 2 );

function kama_sitemaps_posts_query_args( $args, $post_type ){

	if ( 'post' !== $post_type ){
		return $args;
	}

	// take into account that this parameter may already be set
	if( !isset( $args['post__not_in'] ) )
		$args['post__not_in'] = array();

	// exclude posts
	foreach( [ 12, 24 ] as $post_id ){
		$args['post__not_in'][] = $post_id;
	}

	return $args;
}

Remove Terms form Sitemap

For example, terms with ID 12 and 24 from the taxonomy "cities" have the meta tag noindex, so we need to exclude them from the Sitemap.

See hook wp_sitemaps_taxonomies_query_args and all parameters of get_terms().

add_filter( 'wp_sitemaps_taxonomies_query_args', 'kama_sitemaps_taxonomies_query_args', 10, 2 );

function kama_sitemaps_taxonomies_query_args( $args, $taxonomy ){

	if ( 'cities' !== $taxonomy ){
		return $args;
	}

	// take into account that this parameter may already be set
	if( !isset( $args['exclude'] ) )
		$args['exclude'] = array();

	// exclude terms
	$args['exclude'] = array_merge( $args['exclude'], [ 12, 24 ] );

	return $args;
}

Remove single Users from Sitemap

For example, we do not need users with ID 12, 24.

See hook wp_sitemaps_users_query_args and all parameters of get_users().

add_filter( 'wp_sitemaps_users_query_args', 'kama_sitemaps_users_query_args' );

function kama_sitemaps_users_query_args( $args ){

	// take into account that this parameter may already be set
	if( !isset( $args['exclude'] ) )
		$args['exclude'] = array();

	// exclude users
	$args['exclude'] = array_merge( $args['exclude'], [ 12, 24 ] );

	return $args;
}

Is Sitemap Enabled?

To find out if the sitemap is enabled, use this check:

$is_sitemaps_enabled = wp_sitemaps_get_server()->sitemaps_enabled();

if( $is_sitemaps_enabled ){
	// WP Sitemaps is working
}
else {
	// WP Sitemaps is disabled
}

Additional fields (tags) for the Sitemap

Let's add all supported tags for Posts Sitemaps.

add_filter( 'wp_sitemaps_posts_entry', 'wpkama_sitemaps_posts_entry', 10, 2 );

function wpkama_sitemaps_posts_entry( $entry, $post ){

	$entry['lastmod']    = $post->post_modified_gmt;
	$entry['priority']   = 0.8;
	$entry['changefreq'] = 'weekly';

	return $entry;
}

Will get:

More details

The Sitemaps Protocol supports four attributes for each sitemap element <url> (by default WP uses only <loc>). The rest can be added via filters.

Here is what WordPress output by default:

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">

   <url>
	  <loc>http://www.example.com/</loc>
   </url>

</urlset>

Here is what the Protocol supports:

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">

   <url>
	  <loc>http://www.example.com/</loc>
	  <lastmod>2005-01-01</lastmod>
	  <changefreq>monthly</changefreq>
	  <priority>0.8</priority>
   </url>

</urlset>
loc(required)
The URL of the page. This URL must start with a protocol (for example, http). Must be less than 2048 characters.
lastmod

The date when the file was last modified. The date must be in the W3C Datetime format:

  • YYYY - 1997
  • YYYY-MM - 1997-07
  • YYYY-MM-DD - 1997-07-16
  • YYYY-MM-DDThh:mmTZD - 1997-07-16T19:20+01:00
  • YYYY-MM-DDThh:mm:ssTZD - 1997-07-16T19:20:30+01:00

Note that this tag is independent of the If-Modified-Since (304) header. This header can be returned by your server and search engines can use information from both sources in different ways.

changefreq

How often the page changes. This value provides general information for search engines. Search engines may ignore this information and visit the page more or less often. Possible values:

  • always - used for pages that change every time they are accessed.
  • hourly
  • daily
  • weekly
  • monthly
  • yearly
  • never - used for archived URLs.
priority

The importance of this URL in relation to other URLs. The value can be from 0.0 to 1.0. This value doesn't affect how your pages compare to pages on other sites — it only lets search engines know which pages you think are most important. The default priority is 0.5.

Please note that the priority you assign is not affect the URLs position in search engine results. Search engines can use this information when choosing between URLs on the same site. Use this tag to increase the likelihood that your most important pages will be present in the search index.

Also, note that assigning a high priority to all URLs is unlikely to improve anything. Since the priority is relative, it is only used by the bot to choose which page to visit first between the URLs on same site.

All additional XML tags filters  
wp_sitemaps_posts_entry Extra tags (fields) for posts sitemap URLs.
wp_sitemaps_taxonomies_entry Extra tags (fields) for taxonomies sitemap URLs.
wp_sitemaps_users_entry Extra tags (fields) for users sitemap URLs.
wp_sitemaps_index_entry Extra tags (fields) for index page sitemap URL.

Classes, Functions, Hooks

There is a set of functions, classes, and hooks for managing/modifying the Sitemap. Below is a list of all the Functions, classes, and hooks that are associated with the WordPress Sitemap.

Functions  
wp_sitemaps_get_server() Retrieves the current Sitemaps server instance.
wp_get_sitemap_providers() Gets an array of sitemap providers.
wp_register_sitemap_provider() Registers a new sitemap provider.
wp_sitemaps_get_max_urls() Returns the maximum number of URLs for the site map.
Classes  
WP_Sitemaps{} The main class is responsible for setting up rewrites and registering all providers.
WP_Sitemaps_Index{} Creates the main sitemap page, which lists links to all Sitemaps.
WP_Sitemaps_Provider{} A base class for extending sitemap providers. It also contains basic functionality.
WP_Sitemaps_Registry{} Handles the registration of site map providers.
WP_Sitemaps_Renderer{} Responsible for rendering sitemap data in XML in accordance with the sitemap protocol.
WP_Sitemaps_Stylesheet{} Provides XSL style sheets for styling all sitemaps.
WP_Sitemaps_Posts{} Provider. Creates sitemaps for posts, pages and custom posts types.
WP_Sitemaps_Taxonomies{} Provider. Creates sitemaps for the taxonomy object type and its subtypes (custom taxonomies).
WP_Sitemaps_Users{} Provider. Creates sitemaps for the user object type.
Common hooks  
wp_sitemaps_enabled Allow to disable WP Sitemap.
wp_sitemaps_max_urls Filters the maximum number of URLs displayed in the sitemap.
wp_sitemaps_init Triggered when initializing The WP Sitemap.
wp_sitemaps_index_entry Filters the sitemap entry for the home page.
Providers hooks  
wp_sitemaps_add_provider Filters the sitemap provider before adding it.
wp_sitemaps_post_types Filters the posts types to include in sitemap.
wp_sitemaps_posts_entry Filters <url> tags of posts.
wp_sitemaps_posts_show_on_front_entry Filters <url> tags of the index page.
wp_sitemaps_posts_query_args Filters WP_Query query parameters.
wp_sitemaps_posts_pre_url_list Filters URLs list before create it (closure).
wp_sitemaps_posts_pre_max_num_pages Filters max number of posts pages (closure).
wp_sitemaps_taxonomies Filters the taxonomies list.
wp_sitemaps_taxonomies_entry Filters <url> tags of taxonomy element (term).
wp_sitemaps_taxonomies_query_args Filters query parameters of getting taxonomy elements.
wp_sitemaps_taxonomies_pre_url_list Filters the list of taxonomy URLs before create it (closure).
wp_sitemaps_taxonomies_pre_max_num_pages Filters max number of taxonomy pages (closure).
wp_sitemaps_users_entry Filters <url> tags of user posts archive page.
wp_sitemaps_users_query_args Filters query parameters of getting users.
wp_sitemaps_users_pre_url_list Filters the list of users posts pages URLs before create it (closure).
wp_sitemaps_users_pre_max_num_pages Filters max number of users posts pages links (closure).

Notes

WP_Query optimization inside the Sitemap

Since WP 6.1, optimization of WP_Query has been added. Therefore, if you have a persistent object caching plugin installed, such as Redis, then such optimization is not necessary.

More details about the optimization: https://make.wordpress.org/core/2022/10/07/improvements-to-wp_query-performance-in-6-1/

By default to get posts URLs of any post type the WP_Sitemaps_Posts() class uses WP_Query and returns array of posts objects. Whereas in any scenario, we will only need the ID of the post. If you have huge number of posts of any post type, you may loss of performance to generate some of sitemaps. The code below allows you to change WP_Query, forcing it to return an array with only the ID, which will improve the speed of page generation and reduce the load.

// Adding a filter so that WP_Query returns only an post IDs array
add_filter( 'wp_sitemaps_posts_query_args', 'optimize_sitemap_posts_query', 10, 1 );

function optimize_sitemap_posts_query( $args ){
	$args['fields'] = 'ids';

	return $args;
}