Php убрать html сущности

Содержание

Regex to remove an HTML tag and its content from PHP string
Example
How to retain only specified tags
Example
Lorem Ipsum
How to remove certain tags with all their content
Example
Example
Related Articles
html_entity_decode
Parameters
Return Values
html_entity_decode
Список параметров
Возвращаемые значения
Список изменений
Примеры
Примечания
Смотрите также

Regex to remove an HTML tag and its content from PHP string

We use the in-built PHP strip_tags() function to remove HTML, XML, and PHP tags from a PHP string.

Example

Lorem IpsumLorem ipsum dolor sit amet, consectetur adipiscing elit. Donec nec volutpat ligula.
"; echo strip_tags($mystring);

Lorem IpsumLorem ipsum dolor sit amet, consectetur adipiscing elit. Donec nec volutpat ligula.

As you can see, it removes all the HTML tags and their attributes but retains all the content of those tags.

How to retain only specified tags

The strip_tags() function allows for a second optional argument for specifying allowable tags to be spared when the rest HTML tags get stripped off. This way, you can retain some and remove all the other tags.

Example

Lorem IpsumLorem ipsum dolor sit amet, consectetur adipiscing elit. Donec nec volutpat ligula.
"; echo strip_tags($mystring,",");

Lorem Ipsum

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Donec nec volutpat ligula.

As you can see the rest of the tags have been removed leaving the string with only the and

, which were specified in the second argument.

How to remove certain tags with all their content

As opposed to the above examples where only tags are removed but their content remains intact, let’s see how we can do away with specific tags together with their content.

To achieve this we use the PHP preg_replace() function.

The first argument is the regular expression(we specify the tag(s) that we want to remove or replace in it), the second is the match(this is what we replace the specified tag(s) with) and the third is the string in which we want to make changes to.

Replace the terms «tag» with the respective opening and closing tags you wish to remove and $str with your string. These tags in the string will get replaced with whatever you set as the second argument, in this case, removed since we have used empty quotes «» .

Example

Lorem IpsumLorem ipsum dolor sit amet, consectetur adipiscing elit. Donec nec volutpat ligula.
"; echo preg_replace('~~Usi', "", $mystring);

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Donec nec volutpat ligula.

We have removed the tag and its content as specified in the function.

If you would like to strip off multiple tags with their content at a go, you can specify them as an array of regular expressions in the first argument of the function.

Example

Lorem IpsumLorem ipsum dolor sit amet, consectetur adipiscing elit. Donec nec volutpat ligula.
"; echo preg_replace(array('~~Usi','~~Usi','~~Usi'), "", $mystring);

Lorem sit amet, adipiscing elit. Donec nec volutpat ligula.

We have specified an array of , and , all which together with their content have been striped off.

That’s all for this article.

Источник

html_entity_decode

html_entity_decode() is the opposite of htmlentities() in that it converts HTML entities in the string to their corresponding characters.

More precisely, this function decodes all the entities (including all numeric entities) that a) are necessarily valid for the chosen document type — i.e., for XML, this function does not decode named entities that might be defined in some DTD — and b) whose character or characters are in the coded character set associated with the chosen encoding and are permitted in the chosen document type. All other entities are left as is.

Parameters

A bitmask of one or more of the following flags, which specify how to handle quotes and which document type to use. The default is ENT_QUOTES | ENT_SUBSTITUTE | ENT_HTML401 .

Available flags constants

Constant Name	Description
ENT_COMPAT	Will convert double-quotes and leave single-quotes alone.
ENT_QUOTES	Will convert both double and single quotes.
ENT_NOQUOTES	Will leave both double and single quotes unconverted.
ENT_SUBSTITUTE	Replace invalid code unit sequences with a Unicode Replacement Character U+FFFD (UTF-8) or � (otherwise) instead of returning an empty string.
ENT_HTML401	Handle code as HTML 4.01.
ENT_XML1	Handle code as XML 1.
ENT_XHTML	Handle code as XHTML.
ENT_HTML5	Handle code as HTML 5.

An optional argument defining the encoding used when converting characters.

If omitted, encoding defaults to the value of the default_charset configuration option.

Although this argument is technically optional, you are highly encouraged to specify the correct value for your code if the default_charset configuration option may be set incorrectly for the given input.

The following character sets are supported:

Supported charsets

Charset	Aliases	Description
ISO-8859-1	ISO8859-1	Western European, Latin-1.
ISO-8859-5	ISO8859-5	Little used cyrillic charset (Latin/Cyrillic).
ISO-8859-15	ISO8859-15	Western European, Latin-9. Adds the Euro sign, French and Finnish letters missing in Latin-1 (ISO-8859-1).
UTF-8	ASCII compatible multi-byte 8-bit Unicode.
cp866	ibm866, 866	DOS-specific Cyrillic charset.
cp1251	Windows-1251, win-1251, 1251	Windows-specific Cyrillic charset.
cp1252	Windows-1252, 1252	Windows specific charset for Western European.
KOI8-R	koi8-ru, koi8r	Russian.
BIG5	950	Traditional Chinese, mainly used in Taiwan.
GB2312	936	Simplified Chinese, national standard character set.
BIG5-HKSCS	Big5 with Hong Kong extensions, Traditional Chinese.
Shift_JIS	SJIS, SJIS-win, cp932, 932	Japanese
EUC-JP	EUCJP, eucJP-win	Japanese
MacRoman	Charset that was used by Mac OS.
»	An empty string activates detection from script encoding (Zend multibyte), default_charset and current locale (see nl_langinfo() and setlocale() ), in this order. Not recommended.

Note: Any other character sets are not recognized. The default encoding will be used instead and a warning will be emitted.

Return Values

Returns the decoded string.

Источник

html_entity_decode

html_entity_decode() является противоположностью функции htmlentities() . Она преобразует все HTML-сущности в строке string в соответствующие символы.

Если быть точнее, то эта функция преобразует все сущности (в том числе все числовые сущности), которые а) обязательно верны для выбранного типа документа — то есть, для XML, эта функция не преобразует именованные сущности, которые могут быть определены в каком-нибудь DTD — и б) их символы находятся в кодировке соответвующей с выбранной кодировкой и разрешены в выбранном типе документа. Все другие сущности остаются без изменений. набор, связанный с выбранной кодировкой и разрешается в выбранном тип документа. Все другие субъекты, которые оставили как есть.

Список параметров

Битовая маска, состоящая из одного или более флагов, которые указывают как обращаться с кавычками и какой тип документа использовать. По умолчанию маска принимает значение ENT_COMPAT | ENT_HTML401.

Константы flags

Имя константы	Описание
ENT_COMPAT	Преобразуются двойные кавычки, одиночные остаются без изменений.
ENT_QUOTES	Преобразуются и двойные, и одиночные кавычки.
ENT_NOQUOTES	И двойные, и одиночные кавычки остаются без изменений.
ENT_HTML401	Обрабатывать код как HTML 4.01.
ENT_XML1	Обрабатывать код как XML 1.
ENT_XHTML	Обрабатывать код как XHTML.
ENT_HTML5	Обрабатывать код как HTML 5.

Необязательный аргумент определяющий кодировку, используемую при конвертации симоволов.

Если не указан, то значением по умолчанию для encoding зависит от используемой версии PHP. В PHP 5.6 и старше, для значения по умолчанию используется конфигурационная опция default_charset. В PHP 5.4 и 5.5 используется UTF-8 по умолчанию. Более ранние версии PHP используют ISO-8859-1.

Хотя этот аргумент является технически необязательным, настоятельно рекомендуется указать правильное значение для вашего кода, если вы используете PHP 5.5 или выше, или если ваша опция конфигурации default_charset может быть задана неверно для входных данных.

Поддерживаются следующие кодировки:

Поддерживаемые кодировки

Кодировка	Псевдонимы	Описание
ISO-8859-1	ISO8859-1	Западно-европейская Latin-1.
ISO-8859-5	ISO8859-5	Редко используемая кириллическая кодировка (Latin/Cyrillic).
ISO-8859-15	ISO8859-15	Западно-европейская Latin-9. Добавляет знак евро, французские и финские буквы к кодировке Latin-1(ISO-8859-1).
UTF-8	8-битная Unicode, совместимая с ASCII.
cp866	ibm866, 866	Кириллическая кодировка, применяемая в DOS.
cp1251	Windows-1251, win-1251, 1251	Кириллическая кодировка, применяемая в Windows.
cp1252	Windows-1252, 1252	Западно-европейская кодировка, применяемая в Windows.
KOI8-R	koi8-ru, koi8r	Русская кодировка.
BIG5	950	Традиционный китайский, применяется в основном на Тайване.
GB2312	936	Упрощенный китайский, стандартная национальная кодировка.
BIG5-HKSCS	Расширенная Big5, применяемая в Гонг-Конге.
Shift_JIS	SJIS, SJIS-win, cp932, 932	Японская кодировка.
EUC-JP	EUCJP, eucJP-win	Японская кодировка.
MacRoman	Кодировка, используемая в Mac OS.
»	Пустая строка активирует режим определения кодировки из файла скрипта (Zend multibyte), default_charset и текущей локали (см. nl_langinfo() и setlocale() ), в указанном порядке. Не рекомендуется к использованию.

Замечание: Остальные кодировки не поддерживаются, вместо них будет применена кодировка по умолчанию и сгенерировано предупреждение.

Возвращаемые значения

Возвращает раскодированную строку.

Список изменений

Версия	Описание
5.6.0	Значение по умолчанию для параметра encoding было изменено на значение конфигурационной опции default_charset.
5.4.0	Кодировка по умолчанию сменилась с ISO-8859-1 на UTF-8.
5.4.0	Были добавлены константы ENT_HTML401 , ENT_XML1 , ENT_XHTML и ENT_HTML5 .

Примеры

Пример #1 Декодирование HTML-сущностей

$orig = «I’ll \»walk\» the dog now» ;

echo $b ; // I’ll «walk» the dog now
?>

Примечания

Замечание:

Может показаться странным, что результатом вызова trim(html_entity_decode(‘ ‘)); не является пустая строка. Причина том, что ‘ ‘ преобразуется не в символ с ASCII-кодом 32 (который удаляется функцией trim() ),а в символ с ASCII-кодом 160 (0xa0) в принимаемой по умолчанию кодировке ISO-8859-1.

Смотрите также

htmlentities() — Преобразует все возможные символы в соответствующие HTML-сущности
htmlspecialchars() — Преобразует специальные символы в HTML-сущности
get_html_translation_table() — Возвращает таблицу преобразований, используемую функциями htmlspecialchars и htmlentities
urldecode() — Декодирование URL-кодированной строки

Источник

Читайте также: Com objects in php

Php убрать html сущности

Regex to remove an HTML tag and its content from PHP string

Example

How to retain only specified tags

Example

,");

Lorem Ipsum

How to remove certain tags with all their content

Example

Example

Related Articles

html_entity_decode

Parameters

Return Values

html_entity_decode

Список параметров

Возвращаемые значения

Список изменений

Примеры

Примечания

Смотрите также

`,`
`");`