Data Validation and Sanitization Functions (by Data Types)
Unsafe data can come from various sources: users, other websites, your database, etc. Such data must be validated/sanitized when received or displayed. For example, sanitization is needed when creating an SQL query. Beginners often underestimate this "optional" procedure—after all, it works fine as is.
WordPress offers a plethora of functions to properly validate and sanitize data. In this article, I have gathered all validation functions—what PHP functions can be used for which type of data.
Many input data sanitization functions also apply to sanitizing incoming data.
Also read about data sanitization principles
Validation Functions (by Data Types)
Numbers
- (int) $int
- intval( $int )
- Converts $int to an integer. This is a PHP function.
- (float) $int
- floatval( $int )
- Converts $int to a floating-point number. This is a PHP function.
- absint( $int )
- Converts $int to a positive integer. That is, if -5 is passed, it returns 5.
Arrays
- array_map( 'absint', $array )
Converts all values of $array to non-negative numbers. array_map() is a PHP function.
'absint' is the absint() function applied to the values of the array $array. Replace 'absint' with any other function name and all values of the array will be processed by that function.
- map_deep( $value, $callback ) (since version 4.4)
- Applies the specified function to the values of the passed array or properties of an object. A recursive function.
Strings (output)
- esc_html( $text )
- Escaping for HTML blocks. Converts
<, >, &, ", '
characters to HTML entities. - esc_attr( $text )
- Escaping for HTML attributes. Converts
<, >, &, ", '
characters to HTML entities. Does not make double escaping. - esc_textarea( $text )
- Escaping text / string for use in html textarea tag.
- esc_url( $url, $protocols, $_context )
- Cleans the URL for use in text, changes the wrong and removes the dangerous characters.
- esc_sql( $data )
- Prepares data for use in a MySQL query. Protects against SQL injections. May accept an array of strings for processing.
- esc_js( $text )
- Escapes string for save use in JavaScript. Escape single quotes, htmlspecialchar
" < > &
, and fix line endings. - esc_html__( $text, $domain )
- Translates specified string and escapes it for safe use in HTML output.
- esc_html_e( $text, $domain )
- Translates specified string and escape/clears it for showing on screen - replaces special characters in it with HTML entities.
- esc_attr__( $text, $domain )
- Translates the text using esc_attr().
- esc_attr_e( $text, $domain )
- Display translated text that has been escaped for safe use in an HTML tag attribute.
Full list of esc functions.
Strings (input)
WordPress functions for sanitization
WordPress has a number of functions that can sanitize strings that come as data. Almost all such functions start with the prefix sanitize_
:
- sanitize_key( $key )
- Sanitizes the string to use it as a key. Keys are used as different internal IDs. It will leave only:
a-z0-9_-
. - sanitize_text_field( $text )
- Sanitizes the string passed from the input field (usually when saving to the database) or when retrieving from the database. Removes almost everything, leaving only text: without HTML tags, line breaks, etc.
- sanitize_textarea_field( $text )
- Sanitizes the string passed from the textarea field (when saving to the database) or when retrieving from the database. Removes all HTML characters, tabs, HTML entities, etc. Leaves clean text. Since WP 4.7.
- sanitize_html_class( $text )
- Prepares the text for use in the html class attribute: removes all inappropriate characters.
- sanitize_title( $title )
- Used to create slugs for posts/categories.
- sanitize_title_with_dashes( $title )
- Sanitizes the title by replacing spaces with (-).
- sanitize_user( $username, $strict )
- Sanitizes the username (login, username), removing unsafe characters.
$strict = true means only: [a-zA-Z0-9 _*.-] will be allowed in names. - sanitize_file_name( $filename )
- Sanitizes the filename by replacing spaces with '_' and removing invalid characters.
- sanitize_mime_type( $mime_type )
- Sanitizes the string for use as a MIME type. Removes everything except
-+*.a-zA-Z0-9/
. - sanitize_term_field( $field, $value, $term_id, $taxonomy, $context )
- Sanitizes the term value (taxonomy) for use in text.
- sanitize_post_field( $field, $value, $post_id, $context )
- Sanitizes the specified value of the specified post field.
- sanitize_option( $option, $value )
- Sanitizes the values of various options, depending on the type of option.
PHP functions for validation
- ctype_alnum( $text )
Checks if the passed string consists only of numbers and letters.
ctype_alnum( 'AbCd1zyZ9' ); // true ctype_alnum( 'foo!#$bar' ); // false ctype_alnum( 'foo bar' ); // false
- strlen( $string )
- mb_strlen( $string )
- To check if the string has the expected number of characters.
- preg_match( $pattern, $subject, $match )
Checks for a substring in a string using a regular expression (pattern).
// character check: can only be 0-9.- if ( preg_match( '/[^0-9.-]/', $data ) ) { wp_die( 'Invalid format' ); }
- strpos( $haystack, $needle )
To check for the presence of a substring in another string.
$mystring = 'abc'; $findme = 'a'; if( false !== strpos( $mystring, $findme ) ){ echo 'a found in abc'; }
Strings (HTML)
- wp_kses( $string, $allowed_html )
Cleans the string, leaving only the specified/allowed HTML tags, their attributes, and attribute values. It should be used when displaying text where unsafe HTML tags may be present.
For convenient use, wp_kses() has wrappers. For example, to avoid passing an array of allowed tags and use a basic set of minimal allowed HTML tags, you can use:
wp_kses_post( $text ) — removes invalid tags for posts, considering the current user's permissions.
wp_filter_post_kses( $text ) — same as wp_kses_post(), but expects escaped data.
wp_kses_data( $text ) — if you need to limit allowed tags to a minimum, as done in comments. Expects unescaped text.
wp_filter_kses( $text ) — same as wp_kses_data(), but expects an escaped string.
wp_filter_nohtml_kses( $text ) — removes all HTML tags from the passed text. Expects an escaped string. Returns cleaned text.
wp_kses() filters are slow and consume a lot of resources, so they should not be used every time when displaying data; it's better to sanitize incoming data with them, for example, before saving text to the database. Such sanitization is triggered by WordPress when adding comments or posts.
- wp_rel_nofollow( $html )
- Adds
rel="nofollow"
to all<a>
elements in the passed HTML text. - wp_kses_allowed_html( $context )
- Returns an array of allowed HTML elements for the specified context. See description...
- balanceTags( $html )
- force_balance_tags( $html )
- Closes unclosed HTML tags so that the output does not cause an error.
balanceTags() only triggers if a special setting in the site settings is enabled: "WordPress should automatically fix incorrect XHTML code." However, force_balance_tags() always triggers! - tag_escape( $tag_name )
- Sanitizes the HTML tag name. Removes all characters except
a-zA-Z0-9_:
. Converts the string to lowercase. - sanitize_html_class( $class, $fallback )
- Prepares the text for use in the html class attribute: removes all inappropriate characters. Removes everything except
A-Za-z0-9_-
. If the result is empty, the default value can be set in $fallback. - wp_strip_all_tags( $string, $remove_breaks )
Removes all HTML tags from the string. script and style are removed along with their content.
The difference with strip_tags() is that the tags <script> and <style> are removed along with their content. For example:
strip_tags( '<script>something</script>' ); // something wp_strip_all_tags( '<script>something</script>' ); // empty ''
- strip_tags()
- Removes all HTML tags from the string.
Boolean (logical)
- wp_validate_boolean( $val )
Converts the value of the specified variable to boolean true or false.
An alternative construction:
filter_var( $var, FILTER_VALIDATE_BOOLEAN )
.wp_validate_boolean( 'false' ); // bool(false)
- bool_from_yn( $val )
- Checks "yes" or "no". Must specify 'y' or 'Y' for the function to return true.
- is_email( $email )
- Checks if the passed string is an email address. Returns true or false.
- sanitize_email( $email )
- Sanitizes the email: removes invalid characters from the email address.
URL (links)
- esc_url( $url )
Sanitizes the URL for use in text, correcting incorrect ones and removing dangerous characters. Does not allow URLs if they specify a protocol not in the whitelist (http, https, ftp, ftps, mailto, news, irc, gopher, nntp, feed, and telnet).
Use this sanitization when displaying any URL on the screen (in text, in attributes, or elsewhere).
Also, this function encodes some special characters, so it is recommended to use it when generating strings for (X)HTML or XML documents. It encodes ampersands (&) and single quotes into their numeric entities (&, ').
- sanitize_url( $url )
- esc_url_raw( $url )
Sanitizes the URL for safe use. Unlike esc_url(), it does not sanitize for safe output on the screen. Use when you need to get a non-encoded URL, for example: in database queries, during redirects, in HTTP requests.
sanitize_url() is an alias of esc_url_raw().
- urlencode( $url )
A PHP function that encodes a URL so it can be used as a query parameter. It replaces all possible URL characters (
&
,/
,space
etc.) with their entities. To revert such a URL to its original state, use urldecode().This function is not used for displaying the URL on the screen, but for using the URL somewhere in a query, so PHP cannot interpret the string as a URL. For example, if processed
http://example.com/one
, it will becomehttp%3A%2F%2Fexample.com%2Fone
— this is no longer a URL, but a string of characters...- urlencode_deep( $array )
- Processes all elements of the passed array with the urlencode() function.
XML
XML documents, unlike HTML, only recognize certain special characters: '
, &
, >
, <
, "
. Therefore, to output text for XML documents, WordPress has the function:
- ent2ncr( $text )
- Converts string entities to their numeric values:
’ becomes ’
.
JavaScript
- esc_js( $text )
Prepares the string for use in JavaScript. Useful when using single-line JS code (in HTML attributes, for example
onclick='…'
). The string should be enclosed in single quotes.Or here’s an example:
<script> var = '<?php echo esc_js( $js ); ?>'; </script>
File System
- validate_file( $filename, $allowed_files )
Used to protect against directory traversal attacks. Or when you need to check if a file is in the whitelist (parameter $allowed_files).
Returns: 0 if the check is passed. Will return an error number (> 0) if there is an error. This is a non-standard approach, so be careful when writing code!
To check the function, only absolute paths without
../
or./
should be passed. For example,/etc/hosts
will pass the check, while./hosts
will not.A directory traversal attack looks something like this: a request is sent:
http://test.ru/?dir=../../Windows/system.ini
and if the server or code is not protected, the contents of the system.ini file can be obtained.
Such attacks are infrequent according to information from Google and are of medium risk. However, they should not be ignored...
HTTP Headers (URLs, redirects)
HTTP header splitting attacks are created on the client side, not the server, and therefore are very difficult to catch. WordPress sometimes adds user-submitted data to HTTP headers, but to avoid such attacks, the passed headers are checked against a whitelist and all dangerous data is cut out.
To sanitize potentially dangerous headers, WordPress provides 2 redirect functions:
- wp_redirect( $location, $status = 302 )
- A safe way to redirect a user to any URL. The function checks that the final HTTP request does not contain dangerous data.
- wp_safe_redirect( $location, $status = 302 )
- An even more secure way to redirect, where redirection is only possible to whitelisted hosts.
- wp_sanitize_redirect( $location )
- Sanitizes the specified URL for use in redirects.
A brief note on HTTP header splitting attacks
In this type of attack, also known as CRLF injection, a vulnerable web server responds to a specially crafted malicious request with an HTTP response that is interpreted as two separate responses instead of one. This becomes possible when user-submitted data is used in the HTTP response headers without additional checks. An attacker can create a situation where the victim's browser interprets the added header as a response to the second request, while specially crafted data will be displayed and possibly cached in the browser.
To implement HTTP header splitting using a vulnerable server, an attacker takes the following steps:
-
Identifies opportunities for user input that can be added to the HTTP header.
-
Crafts a malicious string to terminate the application request and add their own request with the necessary data in the header.
- Forces the victim to send two requests to the server. The first request contains specially crafted malicious data in the HTTP header, while the second is the application request, causing the victim's browser to interpret the split response as belonging to the second request.
For more details read here.
Database
- $wpdb->prepare( $format, $value1, $value2, ... )
Sanitizes the query: safely replaces placeholders in $format with the specified values in $value1, $value2, ....
$format is a string similar to sprintf(), where only the following placeholders can be specified:%s
,%d
and%f
. There is no need for quotes for strings (%s). prepare() will add quotes itself, i.e.,foo=%s
will becomefoo='value'
. But if there are quotes, it’s fine, the function understands such a structure.Works based on wpdb::_real_escape( $string ) with preliminary formatting processing.
$wpdb->get_var( $wpdb->prepare( "SELECT something FROM table WHERE foo = %s and status = %d", $_GET['name'], // 'not a \' safe string' (the function will sanitize it itself) $_GET['status'] // 'not a safe number (the function will sanitize it itself) ) );
- esc_sql( $sql )
- Escapes a string or an array of strings for use in an SQL query. Relies on the addslashes() function.
$wpdb->prepare() is preferable because it corrects some formatting errors (quotes).
Works based on wpdb::_real_escape( $string ). - $wpdb->escape_by_ref( &$text )
- Does not return data, as they are passed by reference. The data is sanitized "on the fly".
Works based on wpdb::_real_escape( $string ) - $wpdb->esc_like( $text )
Prepares a string for use in the LIKE part of an SQL query. The processed string must be further sanitized with one of the sanitization functions!
$link = $wpdb->esc_like( $link ); // prepare the string for the LIKE argument $link = esc_sql( $link ); // sanitize the variable $link = '%' . $link . '%'; // create the full LIKE search variable // $link is ready for use in the SQL query.
- sanitize_sql_orderby( $orderby )
- Checks if the passed string can be used in the ORDER BY part of an SQL query.
- sanitize_title_for_query( $title )
- Prepares the string for use as a slug in an SQL query. Injection sanitization is done separately. It is implied that this is the name of something: a title, a filename, etc.
Functions Not Related to Data Types
Other sanitization functions that did not fit into the previous lists because they apply to different data types.
- isset( $var )
- Checks if a variable exists.
- empty( $var )
- Checks if a variable is not empty. Ignores errors if the variable does not exist.
- in_array( $needle, $haystack, true )
- To check if a specified element is in a specified array.
- count( $array )
- To check the number of elements in an array.
- sanitize_meta( $meta_key, $meta_value, $meta_type )
Sanitizes the value of metadata. The function itself does nothing but applies the filter
"sanitize_{$meta_type}_meta_{$meta_key}"
, through which different metadata can be sanitized differently.Notably, this function is used in all functions when adding/updating WordPress metadata:
update_*_meta()
oradd_*_meta()
. That is, it usually does not make sense to use it directly, but it is very convenient to use the hook it processes when updating any metadata...- sanitize_term( $term, $taxonomy, $context )
- Sanitizes all fields of a taxonomy item using the sanitize_term_field().
Also see all core functions containing exists, validate or is_:
The filter_var() Function
filter_var() is a very interesting function for checking and sanitizing data.
Checks or sanitizes the value of the specified variable according to the specified parameters.
Returns
Filtered data or FALSE in case of failed validation or filtering.
Usage
$var = filter_var( $var, $filter, $options );
- $var(various) (required)
- The variable to be sanitized or validated.
- $filter(number)
ID of the filter to be used for validation or sanitization. Such IDs are stored as numbers in predefined PHP constants. The complete list can be found here:
By default, the constant
FILTER_DEFAULT
is specified, meaning no filter is applied.
Default: FILTER_DEFAULT- $options(array/constant)
Various parameters or flags for filtering. A constant or a combination of constants using
|
(OR). Or it can be an array that supports only 2 keys:array('options'=>..., 'flags'=>... )
.All possible flags can be found here: flags used by filter_var.
Example of filtering options, specifying constants:
// return null on failed validation filter_var('example.com', FILTER_VALIDATE_URL, FILTER_NULL_ON_FAILURE ); //> null // or flags can be specified in the array element flags filter_var('example.com', FILTER_VALIDATE_URL, array('flags'=>FILTER_NULL_ON_FAILURE) ); //> null
Example of filtering options, specifying an array:
// array of filtering parameters $options = array( 'options' => array( 'default' => 'http://example.com/info', // will return on failed validation ), 'flags' => FILTER_FLAG_PATH_REQUIRED, // flag: Requires the URL to contain a path as a condition. ); $var = filter_var('http://example.com', FILTER_VALIDATE_URL, $options ); echo $var; //> http://example.com/info // i.e., the specified default value was returned, // because the flag states that the URL must have a path
Default: null
Examples of filter_var()
#1 Demonstration
// email $email = filter_var('[email protected]', FILTER_VALIDATE_EMAIL); //> [email protected] $email = filter_var('bob@example', FILTER_VALIDATE_EMAIL); //> false // url $url = filter_var('http://example.com', FILTER_VALIDATE_URL); //> http://example.com $url = filter_var('example.com', FILTER_VALIDATE_URL); //> false $url = filter_var('http://example.com/path', FILTER_VALIDATE_URL, FILTER_FLAG_PATH_REQUIRED); //> http://example.com/path $url = filter_var('http://example.com', FILTER_VALIDATE_URL, FILTER_FLAG_PATH_REQUIRED); //> false
#2 Check if the string is an IP address
if( filter_var( 'foo', FILTER_VALIDATE_IP ) ) echo 'this is an IP'; else echo 'this is not an IP'; // will output: 'this is not an IP' if( filter_var( '123.111.222.123', FILTER_VALIDATE_IP ) ) echo 'this is an IP'; else echo 'this is not an IP'; // will output: 'this is an IP' // alternatively, it can be written as filter_var( 'foo', FILTER_VALIDATE_IP ); //> false filter_var( '123.111.222.123', FILTER_VALIDATE_IP ); //> true
Your Data Validation Functions
You can write your own PHP functions and include them in your theme or plugin.
When writing a validation function, it is recommended to name the functions as questions, for example: is_phone(), is_available(), is_us_zipcode().
Always return only true
or false
at the end of the function!
Example 1
Example of a PHP function that checks if the value is a valid US ZIP code (outside the US, this is called a postal code).
function is_us_zipcode( $zipcode ) { if ( empty( $zipcode ) ) { return false; } // a zip code should never have more than 10 characters if ( 10 <= strlen( trim( $content ) ) ) { // use a regex to check whether this zip code is correct if ( preg_match( '/^\d{5}(\-?\d{4})?$/', $content ) ) { return true; } } else { return false; } }
Now when receiving data, the field can be checked like this:
if( isset($_POST['wporg_zip_code']) && is_us_zip_code($_POST['wporg_zip_code']) ){ // field validated, do something }
Example 2
Suppose you are going to create a query to retrieve posts, and you want to allow users to choose the field by which sorting will occur.
This example checks the incoming sorting field specified in the input field orderby. The possible sorting fields can be listed - this will be our whitelist for validation. We will check this list using the PHP function in_array().
<?php $allowed_keys = ['author', 'post_author', 'date', 'post_date']; // whitelist in lowercase (author), // so change the data if they are specified in uppercase (AUTHOR) $orderby = strtolower( $_POST['orderby'] ); if( in_array( $orderby, $allowed_keys, true ) ){ // validation passed, modify the query }
Such whitelist validation excludes any other values except the allowed ones - this is one of the most reliable checks, and if such validation can be done, make sure to do it.
In the third parameter of in_array(), we specified true - this means that the data type must also match - it must be a string. For example, if you specify in $_POST['orderby'] = true
, and fake the request without type validation, the check will always pass, and we will gain access to the code within the validation:
// example of poor validation if( in_array( true, [ 'date', 'author' ] ) ){ // this code will always trigger! }
--
The information in this guide is taken from official sources and personal experience.