Php string to html safe

htmlentities

This function is identical to htmlspecialchars() in all ways, except with htmlentities() , all characters which have HTML character entity equivalents are translated into these entities. The get_html_translation_table() function can be used to return the translation table used dependent upon the provided flags constants.

If you want to decode instead (the reverse) you can use html_entity_decode() .

Parameters

A bitmask of one or more of the following flags, which specify how to handle quotes, invalid code unit sequences and the used document type. The default is ENT_QUOTES | ENT_SUBSTITUTE | ENT_HTML401 .

Available flags constants
Constant Name Description
ENT_COMPAT Will convert double-quotes and leave single-quotes alone.
ENT_QUOTES Will convert both double and single quotes.
ENT_NOQUOTES Will leave both double and single quotes unconverted.
ENT_IGNORE Silently discard invalid code unit sequences instead of returning an empty string. Using this flag is discouraged as it » may have security implications.
ENT_SUBSTITUTE Replace invalid code unit sequences with a Unicode Replacement Character U+FFFD (UTF-8) or &#FFFD; (otherwise) instead of returning an empty string.
ENT_DISALLOWED Replace invalid code points for the given document type with a Unicode Replacement Character U+FFFD (UTF-8) or &#FFFD; (otherwise) instead of leaving them as is. This may be useful, for instance, to ensure the well-formedness of XML documents with embedded external content.
ENT_HTML401 Handle code as HTML 4.01.
ENT_XML1 Handle code as XML 1.
ENT_XHTML Handle code as XHTML.
ENT_HTML5 Handle code as HTML 5.

An optional argument defining the encoding used when converting characters.

If omitted, encoding defaults to the value of the default_charset configuration option.

Although this argument is technically optional, you are highly encouraged to specify the correct value for your code if the default_charset configuration option may be set incorrectly for the given input.

The following character sets are supported:

Supported charsets
Charset Aliases Description
ISO-8859-1 ISO8859-1 Western European, Latin-1.
ISO-8859-5 ISO8859-5 Little used cyrillic charset (Latin/Cyrillic).
ISO-8859-15 ISO8859-15 Western European, Latin-9. Adds the Euro sign, French and Finnish letters missing in Latin-1 (ISO-8859-1).
UTF-8 ASCII compatible multi-byte 8-bit Unicode.
cp866 ibm866, 866 DOS-specific Cyrillic charset.
cp1251 Windows-1251, win-1251, 1251 Windows-specific Cyrillic charset.
cp1252 Windows-1252, 1252 Windows specific charset for Western European.
KOI8-R koi8-ru, koi8r Russian.
BIG5 950 Traditional Chinese, mainly used in Taiwan.
GB2312 936 Simplified Chinese, national standard character set.
BIG5-HKSCS Big5 with Hong Kong extensions, Traditional Chinese.
Shift_JIS SJIS, SJIS-win, cp932, 932 Japanese
EUC-JP EUCJP, eucJP-win Japanese
MacRoman Charset that was used by Mac OS.
» An empty string activates detection from script encoding (Zend multibyte), default_charset and current locale (see nl_langinfo() and setlocale() ), in this order. Not recommended.

Note: Any other character sets are not recognized. The default encoding will be used instead and a warning will be emitted.

When double_encode is turned off PHP will not encode existing html entities. The default is to convert everything.

Читайте также:  Окно загрузки файла html

Return Values

Returns the encoded string.

If the input string contains an invalid code unit sequence within the given encoding an empty string will be returned, unless either the ENT_IGNORE or ENT_SUBSTITUTE flags are set.

Changelog

Version Description
8.1.0 flags changed from ENT_COMPAT to ENT_QUOTES | ENT_SUBSTITUTE | ENT_HTML401 .
8.0.0 encoding is nullable now.

Examples

Example #1 A htmlentities() example

// Outputs: A ‘quote’ is <b>bold</b>
echo htmlentities ( $str );

// Outputs: A 'quote' is <b>bold</b>
echo htmlentities ( $str , ENT_QUOTES );
?>

Example #2 Usage of ENT_IGNORE

// Outputs an empty string
echo htmlentities ( $str , ENT_QUOTES , «UTF-8» );

// Outputs «. »
echo htmlentities ( $str , ENT_QUOTES | ENT_IGNORE , «UTF-8» );
?>

See Also

  • html_entity_decode() — Convert HTML entities to their corresponding characters
  • get_html_translation_table() — Returns the translation table used by htmlspecialchars and htmlentities
  • htmlspecialchars() — Convert special characters to HTML entities
  • nl2br() — Inserts HTML line breaks before all newlines in a string
  • urlencode() — URL-encodes string

User Contributed Notes 22 notes

An important note below about using this function to secure your application against Cross Site Scripting (XSS) vulnerabilities.

When printing user input in an attribute of an HTML tag, the default configuration of htmlEntities() doesn’t protect you against XSS, when using single quotes to define the border of the tag’s attribute-value. XSS is then possible by injecting a single quote:

$_GET [ ‘a’ ] = «#000′ onload=’alert(document.cookie)» ;
?>

XSS possible (insecure):

$href = htmlEntities ( $_GET [ ‘a’ ]);
print «» ; # results in:
?>

Use the ‘ENT_QUOTES’ quote style option, to ensure no XSS is possible and your application is secure:

$href = htmlEntities ( $_GET [ ‘a’ ], ENT_QUOTES );
print «» ; # results in:
?>

Читайте также:  Find prime factors python

The ‘ENT_QUOTES’ option doesn’t protect you against javascript evaluation in certain tag’s attributes, like the ‘href’ attribute of the ‘a’ tag. When clicked on the link below, the given JavaScript will get executed:

I’ve seen lots of functions to convert all the entities, but I needed to do a fulltext search in a db field that had named entities instead of numeric entities (edited by tinymce), so I searched the tinymce source and found a string with the value->entity mapping. So, i wrote the following function to encode the user’s query with named entities.

The string I used is different of the original, because i didn’t want to convert ‘ or «. The string is too long, so I had to cut it. To get the original check TinyMCE source and search for nbsp or other entity 😉

$entities_unmatched = explode ( ‘,’ , ‘160,nbsp,161,iexcl,162,cent, [. ] ‘ );
$even = 1 ;
foreach( $entities_unmatched as $c ) if( $even ) $ord = $c ;
> else $entities_table [ $ord ] = $c ;
>
$even = 1 — $even ;
>

function encode_named_entities ( $str ) global $entities_table ;

$encoded_str = » ;
for( $i = 0 ; $i < strlen ( $str ); $i ++) $ent = @ $entities_table [ ord ( $str < $i >)];
if( $ent ) $encoded_str .= «& $ent ;» ;
> else $encoded_str .= $str < $i >;
>
>
return $encoded_str ;
>

If you are building a loadvars page for Flash and have problems with special chars such as » & «, » ‘ » etc, you should escape them for flash:

Try trace(escape(«&»)); in flash’ actionscript to see the escape code for &;

function flashentities ( $string )<
return str_replace (array( «&» , «‘» ),array( «%26» , «%27» ), $string );
>
?>

Those are the two that concerned me. YMMV.

The flag ENT_HTML5 also strips newline chars like \n with htmlentities while htmlspecialchars is not affected by that.

If you want to use nl2br on that string afterwards you might end up searching the problem like i did. This does not apply to other flags like e.g. ENT_XHTML which confused me.

Tested this with PHP 5.4 / 5.5 / 5.6-dev with same results, so it seems that this is an intended «feature».

For those Spanish (and not only) folks, that want their national letters back after htmlentities 🙂

protected function _decodeAccented ( $encodedValue , $options = array()) $options += array(
‘quote’ => ENT_NOQUOTES ,
‘encoding’ => ‘UTF-8’ ,
);
return preg_replace_callback (
‘/&\w(acute|uml|tilde);/’ ,
create_function (
‘$m’ ,
‘return html_entity_decode($m[0], ‘ . $options [ ‘quote’ ] . ‘, «‘ .
$options [ ‘encoding’ ] . ‘»);’
),
$encodedValue
);
>
?>

Читайте также:  Обработчик формы

The following will make a string completely safe for XML:

function philsXMLClean ( $strin ) $strout = null ;

Источник

Converting Strings into HTML

A commonly used web attack is called Cross-Site Scripting (XSS). For example, a user enters some malicious data, such as JavaScript code, into a web form; the web page then at some point outputs this information verbatim, without proper escaping. Standard examples for this are your blog’s comments section or discussion forms.

Escaping Strings for HTML

alert("I have a bad Föhnwelle. ");'; echo htmlspecialchars($input); /*Prints: <script>alert("I have a bad Föhnwelle. ");</script>*/ echo htmlentities($input); /*Prints: <script>alert("I have a bad Föhnwelle. ");</script>*/

Here, it is important to remove certain HTML markup. To make a long story short: It is almost impossible to really catch all attempts to inject JavaScript into data. It’s not only always done using the tag, but also in other HTML elements, such as . Therefore, in most cases, all HTML must be removed.

The easiest way to do so is to call htmlspecialchars() ; this converts the string into HTML, including the replacement of all < and >characters by < and > . Another option is to call htmlentities() . This uses HTML entities for characters, if available. The preceding code shows the differences between these two methods. The German ö (o umlaut) is not converted by htmlspecialchars() ; however, htmlentities() replaces it by its entity ö .

The use of htmlspecialchars() and htmlentities() just outputs what the user entered in the browser. So if the user entered HTML markup, this very markup is shown. So htmlspecialchars() and htmlentities() please the browser, but might not please the user.

If you, however, want to prepare strings to be used within URLs, you have to use urlencode() to properly encode special characters such as the space character that can be used in URLs.

Removing All HTML Tags

The function strip_tags() does completely get rid of all HTML elements. If you just want to keep some elements (for example, some limited formatting functionalities with and and
tags), you provide a list of allowed values in the second parameter for strip_tags() .

The following script shows this; the figure depicts its output. As you can see, all unwanted HTML tags have been removed; however, their contents are still there.

attack is called 
Cross-Site Scripting XSS.
For example:
'; echo strip_tags($text, '
');

Working with Strings:

Источник

Оцените статью