Экранирование html кода php

htmlentities

This function is identical to htmlspecialchars() in all ways, except with htmlentities() , all characters which have HTML character entity equivalents are translated into these entities. The get_html_translation_table() function can be used to return the translation table used dependent upon the provided flags constants.

If you want to decode instead (the reverse) you can use html_entity_decode() .

Parameters

A bitmask of one or more of the following flags, which specify how to handle quotes, invalid code unit sequences and the used document type. The default is ENT_QUOTES | ENT_SUBSTITUTE | ENT_HTML401 .

Available flags constants
Constant Name Description
ENT_COMPAT Will convert double-quotes and leave single-quotes alone.
ENT_QUOTES Will convert both double and single quotes.
ENT_NOQUOTES Will leave both double and single quotes unconverted.
ENT_IGNORE Silently discard invalid code unit sequences instead of returning an empty string. Using this flag is discouraged as it » may have security implications.
ENT_SUBSTITUTE Replace invalid code unit sequences with a Unicode Replacement Character U+FFFD (UTF-8) or &#FFFD; (otherwise) instead of returning an empty string.
ENT_DISALLOWED Replace invalid code points for the given document type with a Unicode Replacement Character U+FFFD (UTF-8) or &#FFFD; (otherwise) instead of leaving them as is. This may be useful, for instance, to ensure the well-formedness of XML documents with embedded external content.
ENT_HTML401 Handle code as HTML 4.01.
ENT_XML1 Handle code as XML 1.
ENT_XHTML Handle code as XHTML.
ENT_HTML5 Handle code as HTML 5.

An optional argument defining the encoding used when converting characters.

If omitted, encoding defaults to the value of the default_charset configuration option.

Although this argument is technically optional, you are highly encouraged to specify the correct value for your code if the default_charset configuration option may be set incorrectly for the given input.

The following character sets are supported:

Supported charsets
Charset Aliases Description
ISO-8859-1 ISO8859-1 Western European, Latin-1.
ISO-8859-5 ISO8859-5 Little used cyrillic charset (Latin/Cyrillic).
ISO-8859-15 ISO8859-15 Western European, Latin-9. Adds the Euro sign, French and Finnish letters missing in Latin-1 (ISO-8859-1).
UTF-8 ASCII compatible multi-byte 8-bit Unicode.
cp866 ibm866, 866 DOS-specific Cyrillic charset.
cp1251 Windows-1251, win-1251, 1251 Windows-specific Cyrillic charset.
cp1252 Windows-1252, 1252 Windows specific charset for Western European.
KOI8-R koi8-ru, koi8r Russian.
BIG5 950 Traditional Chinese, mainly used in Taiwan.
GB2312 936 Simplified Chinese, national standard character set.
BIG5-HKSCS Big5 with Hong Kong extensions, Traditional Chinese.
Shift_JIS SJIS, SJIS-win, cp932, 932 Japanese
EUC-JP EUCJP, eucJP-win Japanese
MacRoman Charset that was used by Mac OS.
» An empty string activates detection from script encoding (Zend multibyte), default_charset and current locale (see nl_langinfo() and setlocale() ), in this order. Not recommended.

Note: Any other character sets are not recognized. The default encoding will be used instead and a warning will be emitted.

When double_encode is turned off PHP will not encode existing html entities. The default is to convert everything.

Читайте также:  Great Books of All Time

Return Values

Returns the encoded string.

If the input string contains an invalid code unit sequence within the given encoding an empty string will be returned, unless either the ENT_IGNORE or ENT_SUBSTITUTE flags are set.

Changelog

Version Description
8.1.0 flags changed from ENT_COMPAT to ENT_QUOTES | ENT_SUBSTITUTE | ENT_HTML401 .
8.0.0 encoding is nullable now.

Examples

Example #1 A htmlentities() example

// Outputs: A ‘quote’ is <b>bold</b>
echo htmlentities ( $str );

// Outputs: A 'quote' is <b>bold</b>
echo htmlentities ( $str , ENT_QUOTES );
?>

Example #2 Usage of ENT_IGNORE

// Outputs an empty string
echo htmlentities ( $str , ENT_QUOTES , «UTF-8» );

// Outputs «. »
echo htmlentities ( $str , ENT_QUOTES | ENT_IGNORE , «UTF-8» );
?>

See Also

  • html_entity_decode() — Convert HTML entities to their corresponding characters
  • get_html_translation_table() — Returns the translation table used by htmlspecialchars and htmlentities
  • htmlspecialchars() — Convert special characters to HTML entities
  • nl2br() — Inserts HTML line breaks before all newlines in a string
  • urlencode() — URL-encodes string

User Contributed Notes 22 notes

An important note below about using this function to secure your application against Cross Site Scripting (XSS) vulnerabilities.

When printing user input in an attribute of an HTML tag, the default configuration of htmlEntities() doesn’t protect you against XSS, when using single quotes to define the border of the tag’s attribute-value. XSS is then possible by injecting a single quote:

$_GET [ ‘a’ ] = «#000′ onload=’alert(document.cookie)» ;
?>

XSS possible (insecure):

$href = htmlEntities ( $_GET [ ‘a’ ]);
print «» ; # results in:
?>

Use the ‘ENT_QUOTES’ quote style option, to ensure no XSS is possible and your application is secure:

Читайте также:  Python import global variables

$href = htmlEntities ( $_GET [ ‘a’ ], ENT_QUOTES );
print «» ; # results in:
?>

The ‘ENT_QUOTES’ option doesn’t protect you against javascript evaluation in certain tag’s attributes, like the ‘href’ attribute of the ‘a’ tag. When clicked on the link below, the given JavaScript will get executed:

I’ve seen lots of functions to convert all the entities, but I needed to do a fulltext search in a db field that had named entities instead of numeric entities (edited by tinymce), so I searched the tinymce source and found a string with the value->entity mapping. So, i wrote the following function to encode the user’s query with named entities.

The string I used is different of the original, because i didn’t want to convert ‘ or «. The string is too long, so I had to cut it. To get the original check TinyMCE source and search for nbsp or other entity 😉

$entities_unmatched = explode ( ‘,’ , ‘160,nbsp,161,iexcl,162,cent, [. ] ‘ );
$even = 1 ;
foreach( $entities_unmatched as $c ) if( $even ) $ord = $c ;
> else $entities_table [ $ord ] = $c ;
>
$even = 1 — $even ;
>

function encode_named_entities ( $str ) global $entities_table ;

$encoded_str = » ;
for( $i = 0 ; $i < strlen ( $str ); $i ++) $ent = @ $entities_table [ ord ( $str < $i >)];
if( $ent ) $encoded_str .= «& $ent ;» ;
> else $encoded_str .= $str < $i >;
>
>
return $encoded_str ;
>

If you are building a loadvars page for Flash and have problems with special chars such as » & «, » ‘ » etc, you should escape them for flash:

Try trace(escape(«&»)); in flash’ actionscript to see the escape code for &;

function flashentities ( $string )<
return str_replace (array( «&» , «‘» ),array( «%26» , «%27» ), $string );
>
?>

Those are the two that concerned me. YMMV.

The flag ENT_HTML5 also strips newline chars like \n with htmlentities while htmlspecialchars is not affected by that.

If you want to use nl2br on that string afterwards you might end up searching the problem like i did. This does not apply to other flags like e.g. ENT_XHTML which confused me.

Читайте также:  Php массив удалить пустые строки

Tested this with PHP 5.4 / 5.5 / 5.6-dev with same results, so it seems that this is an intended «feature».

For those Spanish (and not only) folks, that want their national letters back after htmlentities 🙂

protected function _decodeAccented ( $encodedValue , $options = array()) $options += array(
‘quote’ => ENT_NOQUOTES ,
‘encoding’ => ‘UTF-8’ ,
);
return preg_replace_callback (
‘/&\w(acute|uml|tilde);/’ ,
create_function (
‘$m’ ,
‘return html_entity_decode($m[0], ‘ . $options [ ‘quote’ ] . ‘, «‘ .
$options [ ‘encoding’ ] . ‘»);’
),
$encodedValue
);
>
?>

The following will make a string completely safe for XML:

function philsXMLClean ( $strin ) $strout = null ;

Источник

Экранирование html кода php

Everything outside of a pair of opening and closing tags is ignored by the PHP parser which allows PHP files to have mixed content. This allows PHP to be embedded in HTML documents, for example to create templates.

This is going to be ignored by PHP and displayed by the browser.

This will also be ignored by PHP and displayed by the browser.

This works as expected, because when the PHP interpreter hits the ?> closing tags, it simply starts outputting whatever it finds (except for the immediately following newline — see instruction separation) until it hits another opening tag unless in the middle of a conditional statement in which case the interpreter will determine the outcome of the conditional before making a decision of what to skip over. See the next example.

Using structures with conditions

Example #1 Advanced escaping using conditions

In this example PHP will skip the blocks where the condition is not met, even though they are outside of the PHP open/close tags; PHP skips them according to the condition since the PHP interpreter will jump over blocks contained within a condition that is not met.

For outputting large blocks of text, dropping out of PHP parsing mode is generally more efficient than sending all of the text through echo or print .

Note:

If PHP is embeded within XML or XHTML the normal PHP must be used to remain compliant with the standards.

Источник

Оцените статью