htmlentities
This function is identical to htmlspecialchars() in all ways, except with htmlentities() , all characters which have HTML character entity equivalents are translated into these entities. The get_html_translation_table() function can be used to return the translation table used dependent upon the provided flags constants.
If you want to decode instead (the reverse) you can use html_entity_decode() .
Parameters
A bitmask of one or more of the following flags, which specify how to handle quotes, invalid code unit sequences and the used document type. The default is ENT_QUOTES | ENT_SUBSTITUTE | ENT_HTML401 .
Constant Name | Description |
---|---|
ENT_COMPAT | Will convert double-quotes and leave single-quotes alone. |
ENT_QUOTES | Will convert both double and single quotes. |
ENT_NOQUOTES | Will leave both double and single quotes unconverted. |
ENT_IGNORE | Silently discard invalid code unit sequences instead of returning an empty string. Using this flag is discouraged as it » may have security implications. |
ENT_SUBSTITUTE | Replace invalid code unit sequences with a Unicode Replacement Character U+FFFD (UTF-8) or &#FFFD; (otherwise) instead of returning an empty string. |
ENT_DISALLOWED | Replace invalid code points for the given document type with a Unicode Replacement Character U+FFFD (UTF-8) or &#FFFD; (otherwise) instead of leaving them as is. This may be useful, for instance, to ensure the well-formedness of XML documents with embedded external content. |
ENT_HTML401 | Handle code as HTML 4.01. |
ENT_XML1 | Handle code as XML 1. |
ENT_XHTML | Handle code as XHTML. |
ENT_HTML5 | Handle code as HTML 5. |
An optional argument defining the encoding used when converting characters.
If omitted, encoding defaults to the value of the default_charset configuration option.
Although this argument is technically optional, you are highly encouraged to specify the correct value for your code if the default_charset configuration option may be set incorrectly for the given input.
The following character sets are supported:
Charset | Aliases | Description |
---|---|---|
ISO-8859-1 | ISO8859-1 | Western European, Latin-1. |
ISO-8859-5 | ISO8859-5 | Little used cyrillic charset (Latin/Cyrillic). |
ISO-8859-15 | ISO8859-15 | Western European, Latin-9. Adds the Euro sign, French and Finnish letters missing in Latin-1 (ISO-8859-1). |
UTF-8 | ASCII compatible multi-byte 8-bit Unicode. | |
cp866 | ibm866, 866 | DOS-specific Cyrillic charset. |
cp1251 | Windows-1251, win-1251, 1251 | Windows-specific Cyrillic charset. |
cp1252 | Windows-1252, 1252 | Windows specific charset for Western European. |
KOI8-R | koi8-ru, koi8r | Russian. |
BIG5 | 950 | Traditional Chinese, mainly used in Taiwan. |
GB2312 | 936 | Simplified Chinese, national standard character set. |
BIG5-HKSCS | Big5 with Hong Kong extensions, Traditional Chinese. | |
Shift_JIS | SJIS, SJIS-win, cp932, 932 | Japanese |
EUC-JP | EUCJP, eucJP-win | Japanese |
MacRoman | Charset that was used by Mac OS. | |
» | An empty string activates detection from script encoding (Zend multibyte), default_charset and current locale (see nl_langinfo() and setlocale() ), in this order. Not recommended. |
Note: Any other character sets are not recognized. The default encoding will be used instead and a warning will be emitted.
When double_encode is turned off PHP will not encode existing html entities. The default is to convert everything.
Return Values
Returns the encoded string.
If the input string contains an invalid code unit sequence within the given encoding an empty string will be returned, unless either the ENT_IGNORE or ENT_SUBSTITUTE flags are set.
Changelog
Version | Description |
---|---|
8.1.0 | flags changed from ENT_COMPAT to ENT_QUOTES | ENT_SUBSTITUTE | ENT_HTML401 . |
8.0.0 | encoding is nullable now. |
Examples
Example #1 A htmlentities() example
// Outputs: A ‘quote’ is <b>bold</b>
echo htmlentities ( $str );
// Outputs: A 'quote' is <b>bold</b>
echo htmlentities ( $str , ENT_QUOTES );
?>
Example #2 Usage of ENT_IGNORE
// Outputs an empty string
echo htmlentities ( $str , ENT_QUOTES , «UTF-8» );
// Outputs «. »
echo htmlentities ( $str , ENT_QUOTES | ENT_IGNORE , «UTF-8» );
?>
See Also
- html_entity_decode() — Convert HTML entities to their corresponding characters
- get_html_translation_table() — Returns the translation table used by htmlspecialchars and htmlentities
- htmlspecialchars() — Convert special characters to HTML entities
- nl2br() — Inserts HTML line breaks before all newlines in a string
- urlencode() — URL-encodes string
User Contributed Notes 22 notes
An important note below about using this function to secure your application against Cross Site Scripting (XSS) vulnerabilities.
When printing user input in an attribute of an HTML tag, the default configuration of htmlEntities() doesn’t protect you against XSS, when using single quotes to define the border of the tag’s attribute-value. XSS is then possible by injecting a single quote:
$_GET [ ‘a’ ] = «#000′ onload=’alert(document.cookie)» ;
?>
XSS possible (insecure):
$href = htmlEntities ( $_GET [ ‘a’ ]);
print «» ; # results in:
?>
Use the ‘ENT_QUOTES’ quote style option, to ensure no XSS is possible and your application is secure:
$href = htmlEntities ( $_GET [ ‘a’ ], ENT_QUOTES );
print «» ; # results in:
?>
The ‘ENT_QUOTES’ option doesn’t protect you against javascript evaluation in certain tag’s attributes, like the ‘href’ attribute of the ‘a’ tag. When clicked on the link below, the given JavaScript will get executed:
I’ve seen lots of functions to convert all the entities, but I needed to do a fulltext search in a db field that had named entities instead of numeric entities (edited by tinymce), so I searched the tinymce source and found a string with the value->entity mapping. So, i wrote the following function to encode the user’s query with named entities.
The string I used is different of the original, because i didn’t want to convert ‘ or «. The string is too long, so I had to cut it. To get the original check TinyMCE source and search for nbsp or other entity 😉
$entities_unmatched = explode ( ‘,’ , ‘160,nbsp,161,iexcl,162,cent, [. ] ‘ );
$even = 1 ;
foreach( $entities_unmatched as $c ) if( $even ) $ord = $c ;
> else $entities_table [ $ord ] = $c ;
>
$even = 1 — $even ;
>
function encode_named_entities ( $str ) global $entities_table ;
$encoded_str = » ;
for( $i = 0 ; $i < strlen ( $str ); $i ++) $ent = @ $entities_table [ ord ( $str < $i >)];
if( $ent ) $encoded_str .= «& $ent ;» ;
> else $encoded_str .= $str < $i >;
>
>
return $encoded_str ;
>
If you are building a loadvars page for Flash and have problems with special chars such as » & «, » ‘ » etc, you should escape them for flash:
Try trace(escape(«&»)); in flash’ actionscript to see the escape code for &;
function flashentities ( $string )<
return str_replace (array( «&» , «‘» ),array( «%26» , «%27» ), $string );
>
?>
Those are the two that concerned me. YMMV.
The flag ENT_HTML5 also strips newline chars like \n with htmlentities while htmlspecialchars is not affected by that.
If you want to use nl2br on that string afterwards you might end up searching the problem like i did. This does not apply to other flags like e.g. ENT_XHTML which confused me.
Tested this with PHP 5.4 / 5.5 / 5.6-dev with same results, so it seems that this is an intended «feature».
For those Spanish (and not only) folks, that want their national letters back after htmlentities 🙂
protected function _decodeAccented ( $encodedValue , $options = array()) $options += array(
‘quote’ => ENT_NOQUOTES ,
‘encoding’ => ‘UTF-8’ ,
);
return preg_replace_callback (
‘/&\w(acute|uml|tilde);/’ ,
create_function (
‘$m’ ,
‘return html_entity_decode($m[0], ‘ . $options [ ‘quote’ ] . ‘, «‘ .
$options [ ‘encoding’ ] . ‘»);’
),
$encodedValue
);
>
?>
The following will make a string completely safe for XML:
function philsXMLClean ( $strin ) $strout = null ;
Экранирование html кода php
Everything outside of a pair of opening and closing tags is ignored by the PHP parser which allows PHP files to have mixed content. This allows PHP to be embedded in HTML documents, for example to create templates.
This is going to be ignored by PHP and displayed by the browser.
This will also be ignored by PHP and displayed by the browser.
This works as expected, because when the PHP interpreter hits the ?> closing tags, it simply starts outputting whatever it finds (except for the immediately following newline — see instruction separation) until it hits another opening tag unless in the middle of a conditional statement in which case the interpreter will determine the outcome of the conditional before making a decision of what to skip over. See the next example.
Using structures with conditions
Example #1 Advanced escaping using conditions
In this example PHP will skip the blocks where the condition is not met, even though they are outside of the PHP open/close tags; PHP skips them according to the condition since the PHP interpreter will jump over blocks contained within a condition that is not met.
For outputting large blocks of text, dropping out of PHP parsing mode is generally more efficient than sending all of the text through echo or print .
Note:
If PHP is embeded within XML or XHTML the normal PHP must be used to remain compliant with the standards.