- Saved searches
- Use saved searches to filter your results more quickly
- License
- masroore/php-html2text
- Name already in use
- Sign In Required
- Launching GitHub Desktop
- Launching GitHub Desktop
- Launching Xcode
- Launching Visual Studio Code
- Latest commit
- Git stats
- Files
- README.md
- About
- html_entity_decode
- Parameters
- Return Values
- Saved searches
- Use saved searches to filter your results more quickly
- License
- soundasleep/html2text
- Name already in use
- Sign In Required
- Launching GitHub Desktop
- Launching GitHub Desktop
- Launching Xcode
- Launching Visual Studio Code
- Latest commit
- Git stats
- Files
- README.md
- About
- PHP html_entity_decode() Function
- Definition and Usage
- Syntax
- Parameter Values
- Technical Details
- More Examples
- Example
- Example
Saved searches
Use saved searches to filter your results more quickly
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.
A PHP package to convert HTML into a plain text format
License
masroore/php-html2text
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Name already in use
A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Sign In Required
Please sign in to use Codespaces.
Launching GitHub Desktop
If nothing happens, download GitHub Desktop and try again.
Launching GitHub Desktop
If nothing happens, download GitHub Desktop and try again.
Launching Xcode
If nothing happens, download Xcode and try again.
Launching Visual Studio Code
Your codespace will open once ready.
There was a problem preparing your codespace, please try again.
Latest commit
Git stats
Files
Failed to load latest commit information.
README.md
A PHP package to convert HTML into plain text — no HTML tags allowed in the output.
masroore/html2text is a PHP package that converts a page of HTML into clean, easy-to-read plain ASCII text.
You can install the package via composer:
composer require masroore/html2text
use Kaiju\Html2Text\Html2Text; $converter = new Html2Text(); echo $converter->convert($html);
Callback functions
You are able to change process of formatting by providing callbacks in pre-processing, tag-replacing and post-processing:
Please see CHANGELOG for more information on what has changed recently.
Thank you for considering to contribute to Html2Text. All the contribution guidelines are mentioned here.
Please review our security policy on how to report security vulnerabilities.
Html2Text is an open-sourced software licensed under the MIT license.
About
A PHP package to convert HTML into a plain text format
html_entity_decode
html_entity_decode() is the opposite of htmlentities() in that it converts HTML entities in the string to their corresponding characters.
More precisely, this function decodes all the entities (including all numeric entities) that a) are necessarily valid for the chosen document type — i.e., for XML, this function does not decode named entities that might be defined in some DTD — and b) whose character or characters are in the coded character set associated with the chosen encoding and are permitted in the chosen document type. All other entities are left as is.
Parameters
A bitmask of one or more of the following flags, which specify how to handle quotes and which document type to use. The default is ENT_QUOTES | ENT_SUBSTITUTE | ENT_HTML401 .
Constant Name | Description |
---|---|
ENT_COMPAT | Will convert double-quotes and leave single-quotes alone. |
ENT_QUOTES | Will convert both double and single quotes. |
ENT_NOQUOTES | Will leave both double and single quotes unconverted. |
ENT_SUBSTITUTE | Replace invalid code unit sequences with a Unicode Replacement Character U+FFFD (UTF-8) or � (otherwise) instead of returning an empty string. |
ENT_HTML401 | Handle code as HTML 4.01. |
ENT_XML1 | Handle code as XML 1. |
ENT_XHTML | Handle code as XHTML. |
ENT_HTML5 | Handle code as HTML 5. |
An optional argument defining the encoding used when converting characters.
If omitted, encoding defaults to the value of the default_charset configuration option.
Although this argument is technically optional, you are highly encouraged to specify the correct value for your code if the default_charset configuration option may be set incorrectly for the given input.
The following character sets are supported:
Charset | Aliases | Description |
---|---|---|
ISO-8859-1 | ISO8859-1 | Western European, Latin-1. |
ISO-8859-5 | ISO8859-5 | Little used cyrillic charset (Latin/Cyrillic). |
ISO-8859-15 | ISO8859-15 | Western European, Latin-9. Adds the Euro sign, French and Finnish letters missing in Latin-1 (ISO-8859-1). |
UTF-8 | ASCII compatible multi-byte 8-bit Unicode. | |
cp866 | ibm866, 866 | DOS-specific Cyrillic charset. |
cp1251 | Windows-1251, win-1251, 1251 | Windows-specific Cyrillic charset. |
cp1252 | Windows-1252, 1252 | Windows specific charset for Western European. |
KOI8-R | koi8-ru, koi8r | Russian. |
BIG5 | 950 | Traditional Chinese, mainly used in Taiwan. |
GB2312 | 936 | Simplified Chinese, national standard character set. |
BIG5-HKSCS | Big5 with Hong Kong extensions, Traditional Chinese. | |
Shift_JIS | SJIS, SJIS-win, cp932, 932 | Japanese |
EUC-JP | EUCJP, eucJP-win | Japanese |
MacRoman | Charset that was used by Mac OS. | |
» | An empty string activates detection from script encoding (Zend multibyte), default_charset and current locale (see nl_langinfo() and setlocale() ), in this order. Not recommended. |
Note: Any other character sets are not recognized. The default encoding will be used instead and a warning will be emitted.
Return Values
Returns the decoded string.
Saved searches
Use saved searches to filter your results more quickly
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.
A PHP component to convert HTML into a plain text format
License
soundasleep/html2text
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Name already in use
A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Sign In Required
Please sign in to use Codespaces.
Launching GitHub Desktop
If nothing happens, download GitHub Desktop and try again.
Launching GitHub Desktop
If nothing happens, download GitHub Desktop and try again.
Launching Xcode
If nothing happens, download Xcode and try again.
Launching Visual Studio Code
Your codespace will open once ready.
There was a problem preparing your codespace, please try again.
Latest commit
Git stats
Files
Failed to load latest commit information.
README.md
html2text is a very simple script that uses DOM methods to convert HTML into a format similar to what would be rendered by a browser — perfect for places where you need a quick text representation. For example:
html> title>Ignored Titletitle> body> h1>Hello, World!h1> p>This is some e-mail content. Even though it has whitespace and newlines, the e-mail converter will handle it correctly. p>Even mismatched tags.p> div>A divdiv> div>Another divdiv> div>A divdiv>within a divdiv>div> a href pl-s">http://foo.com">A linka> body> html>
Hello, World! This is some e-mail content. Even though it has whitespace and newlines, the e-mail converter will handle it correctly. Even mismatched tags. A div Another div A div within a div [A link](http://foo.com)
You can use Composer to add the package to your project:
< "require": < "soundasleep/html2text": "~1.1" > >
And then use it quite simply:
$text = \Soundasleep\Html2Text::convert($html);
You can also include the supplied html2text.php and use $text = convert_html_to_text($html); instead.
Option | Default | Description |
---|---|---|
ignore_errors | false | Set to true to ignore any XML parsing errors. |
drop_links | false | Set to true to not render links as [http://foo.com](My Link) , but rather just My Link . |
char_set | ‘auto’ | Specify a specific character set. Pass multiple character sets (comma separated) to detect encoding, default is ASCII,UTF-8 |
Pass along options as a second argument to convert , for example:
$options = array( 'ignore_errors' => true, // other options go here ); $text = \Soundasleep\Html2Text::convert($html, $options);
Some very basic tests are provided in the tests/ directory. Run them with composer install && vendor/bin/phpunit .
Class ‘DOMDocument’ not found
You need to install the PHP XML extension for your PHP version. e.g. apt-get install php7.4-xml
html2text is licensed under MIT, making it suitable for both Eclipse and GPL projects.
Also see html2text_ruby, a Ruby implementation.
About
A PHP component to convert HTML into a plain text format
PHP html_entity_decode() Function
The HTML output of the code above will be (View Source):
The browser output of the code above will be:
Definition and Usage
The html_entity_decode() function converts HTML entities to characters.
The html_entity_decode() function is the opposite of htmlentities().
Syntax
Parameter Values
Parameter | Description |
---|---|
string | Required. Specifies the string to decode |
flags | Optional. Specifies how to handle quotes and which document type to use. |
The available quote styles are:
- ENT_COMPAT — Default. Decodes only double quotes
- ENT_QUOTES — Decodes double and single quotes
- ENT_NOQUOTES — Does not decode any quotes
Additional flags for specifying the used doctype:
- ENT_HTML401 — Default. Handle code as HTML 4.01
- ENT_HTML5 — Handle code as HTML 5
- ENT_XML1 — Handle code as XML 1
- ENT_XHTML — Handle code as XHTML
- UTF-8 — Default. ASCII compatible multi-byte 8-bit Unicode
- ISO-8859-1 — Western European
- ISO-8859-15 — Western European (adds the Euro sign + French and Finnish letters missing in ISO-8859-1)
- cp866 — DOS-specific Cyrillic charset
- cp1251 — Windows-specific Cyrillic charset
- cp1252 — Windows specific charset for Western European
- KOI8-R — Russian
- BIG5 — Traditional Chinese, mainly used in Taiwan
- GB2312 — Simplified Chinese, national standard character set
- BIG5-HKSCS — Big5 with Hong Kong extensions
- Shift_JIS — Japanese
- EUC-JP — Japanese
- MacRoman — Character-set that was used by Mac OS
Note: Unrecognized character-sets will be ignored and replaced by ISO-8859-1 in versions prior to PHP 5.4. As of PHP 5.4, it will be ignored an replaced by UTF-8.
Technical Details
Return Value: | Returns the converted string |
---|---|
PHP Version: | 4.3.0+ |
Changelog: | PHP 5.6 — Changed the default value for the character-set parameter to the value of the default charset (in configuration). PHP 5.4 — Changed the default value for the character-set parameter to UTF-8. PHP 5.4 — Added ENT_HTML401, ENT_HTML5, ENT_XML1 and ENT_XHTML. PHP 5.0 — Added support for multi-byte encodings |
More Examples
Example
Convert some HTML entities to characters:
$str = «Albert Einstein said: 'E=MC²'»;
echo html_entity_decode($str, ENT_COMPAT); // Will only convert double quotes
echo «
«;
echo html_entity_decode($str, ENT_QUOTES); // Converts double and single quotes
echo «
«;
echo html_entity_decode($str, ENT_NOQUOTES); // Does not convert any quotes
?>?php
The HTML output of the code above will be (View Source):
Albert Einstein said: 'E=MC²'
Albert Einstein said: ‘E=MC²’
Albert Einstein said: 'E=MC²'
The browser output of the code above will be:
Example
Convert some HTML entities to characters, using the Western European character-set:
$str = «My name is Øyvind Åsane. I'm Norwegian.»;
echo html_entity_decode($str, ENT_QUOTES, «UTF-8»);
?>?php
The HTML output of the code above will be (View Source):
The browser output of the code above will be: