WP_REST_URL_Details_Controller::get_meta_with_content_elements()
Gets all the meta tag elements that have a 'content' attribute.
Method of the class: WP_REST_URL_Details_Controller{}
No Hooks.
Return
Array
. A multidimensional indexed array on success, else empty array.
Usage
// private - for code of main (parent) class only $result = $this->get_meta_with_content_elements( $html );
- $html(string) (required)
- The string of HTML to be parsed.
Changelog
Since 5.9.0 | Introduced. |
WP_REST_URL_Details_Controller::get_meta_with_content_elements() WP REST URL Details Controller::get meta with content elements code WP 6.8
private function get_meta_with_content_elements( $html ) { /* * Parse all meta elements with a content attribute. * * Why first search for the content attribute rather than directly searching for name=description element? * tl;dr The content attribute's value will be truncated when it contains a > symbol. * * The content attribute's value (i.e. the description to get) can have HTML in it and be well-formed as * it's a string to the browser. Imagine what happens when attempting to match for the name=description * first. Hmm, if a > or /> symbol is in the content attribute's value, then it terminates the match * as the element's closing symbol. But wait, it's in the content attribute and is not the end of the * element. This is a limitation of using regex. It can't determine "wait a minute this is inside of quotation". * If this happens, what gets matched is not the entire element or all of the content. * * Why not search for the name=description and then content="(.*)"? * The attribute order could be opposite. Plus, additional attributes may exist including being between * the name and content attributes. * * Why not lookahead? * Lookahead is not constrained to stay within the element. The first <meta it finds may not include * the name or content, but rather could be from a different element downstream. */ $pattern = '#<meta\s' . /* * Allows for additional attributes before the content attribute. * Searches for anything other than > symbol. */ '[^>]*' . /* * Find the content attribute. When found, capture its value (.*). * * Allows for (a) single or double quotes and (b) whitespace in the value. * * Why capture the opening quotation mark, i.e. (["\']), and then backreference, * i.e \1, for the closing quotation mark? * To ensure the closing quotation mark matches the opening one. Why? Attribute values * can contain quotation marks, such as an apostrophe in the content. */ 'content=(["\']??)(.*)\1' . /* * Allows for additional attributes after the content attribute. * Searches for anything other than > symbol. */ '[^>]*' . /* * \/?> searches for the closing > symbol, which can be in either /> or > format. * # ends the pattern. */ '\/?>#' . /* * These are the options: * - i : case-insensitive * - s : allows newline characters for the . match (needed for multiline elements) * - U means non-greedy matching */ 'isU'; preg_match_all( $pattern, $html, $elements ); return $elements; }