Php convert html to text

Saved searches

Use saved searches to filter your results more quickly

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

A PHP package to convert HTML into a plain text format

License

masroore/php-html2text

This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Sign In Required

Please sign in to use Codespaces.

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Launching Xcode

If nothing happens, download Xcode and try again.

Launching Visual Studio Code

Your codespace will open once ready.

There was a problem preparing your codespace, please try again.

Latest commit

Git stats

Files

Failed to load latest commit information.

README.md

A PHP package to convert HTML into plain text — no HTML tags allowed in the output.

masroore/html2text is a PHP package that converts a page of HTML into clean, easy-to-read plain ASCII text.

You can install the package via composer:

composer require masroore/html2text
use Kaiju\Html2Text\Html2Text; $converter = new Html2Text(); echo $converter->convert($html);

Callback functions

You are able to change process of formatting by providing callbacks in pre-processing, tag-replacing and post-processing:

Please see CHANGELOG for more information on what has changed recently.

Thank you for considering to contribute to Html2Text. All the contribution guidelines are mentioned here.

Please review our security policy on how to report security vulnerabilities.

Html2Text is an open-sourced software licensed under the MIT license.

About

A PHP package to convert HTML into a plain text format

Источник

html_entity_decode

html_entity_decode() is the opposite of htmlentities() in that it converts HTML entities in the string to their corresponding characters.

More precisely, this function decodes all the entities (including all numeric entities) that a) are necessarily valid for the chosen document type — i.e., for XML, this function does not decode named entities that might be defined in some DTD — and b) whose character or characters are in the coded character set associated with the chosen encoding and are permitted in the chosen document type. All other entities are left as is.

Читайте также:  Javascript scroll to top no jquery

Parameters

A bitmask of one or more of the following flags, which specify how to handle quotes and which document type to use. The default is ENT_QUOTES | ENT_SUBSTITUTE | ENT_HTML401 .

Available flags constants
Constant Name Description
ENT_COMPAT Will convert double-quotes and leave single-quotes alone.
ENT_QUOTES Will convert both double and single quotes.
ENT_NOQUOTES Will leave both double and single quotes unconverted.
ENT_SUBSTITUTE Replace invalid code unit sequences with a Unicode Replacement Character U+FFFD (UTF-8) or � (otherwise) instead of returning an empty string.
ENT_HTML401 Handle code as HTML 4.01.
ENT_XML1 Handle code as XML 1.
ENT_XHTML Handle code as XHTML.
ENT_HTML5 Handle code as HTML 5.

An optional argument defining the encoding used when converting characters.

If omitted, encoding defaults to the value of the default_charset configuration option.

Although this argument is technically optional, you are highly encouraged to specify the correct value for your code if the default_charset configuration option may be set incorrectly for the given input.

The following character sets are supported:

Supported charsets
Charset Aliases Description
ISO-8859-1 ISO8859-1 Western European, Latin-1.
ISO-8859-5 ISO8859-5 Little used cyrillic charset (Latin/Cyrillic).
ISO-8859-15 ISO8859-15 Western European, Latin-9. Adds the Euro sign, French and Finnish letters missing in Latin-1 (ISO-8859-1).
UTF-8 ASCII compatible multi-byte 8-bit Unicode.
cp866 ibm866, 866 DOS-specific Cyrillic charset.
cp1251 Windows-1251, win-1251, 1251 Windows-specific Cyrillic charset.
cp1252 Windows-1252, 1252 Windows specific charset for Western European.
KOI8-R koi8-ru, koi8r Russian.
BIG5 950 Traditional Chinese, mainly used in Taiwan.
GB2312 936 Simplified Chinese, national standard character set.
BIG5-HKSCS Big5 with Hong Kong extensions, Traditional Chinese.
Shift_JIS SJIS, SJIS-win, cp932, 932 Japanese
EUC-JP EUCJP, eucJP-win Japanese
MacRoman Charset that was used by Mac OS.
» An empty string activates detection from script encoding (Zend multibyte), default_charset and current locale (see nl_langinfo() and setlocale() ), in this order. Not recommended.

Note: Any other character sets are not recognized. The default encoding will be used instead and a warning will be emitted.

Return Values

Returns the decoded string.

Источник

Saved searches

Use saved searches to filter your results more quickly

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

Читайте также:  Outline text box html

A PHP component to convert HTML into a plain text format

License

soundasleep/html2text

This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Sign In Required

Please sign in to use Codespaces.

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Launching GitHub Desktop

If nothing happens, download GitHub Desktop and try again.

Launching Xcode

If nothing happens, download Xcode and try again.

Launching Visual Studio Code

Your codespace will open once ready.

There was a problem preparing your codespace, please try again.

Latest commit

Git stats

Files

Failed to load latest commit information.

README.md

Total Downloads

html2text is a very simple script that uses DOM methods to convert HTML into a format similar to what would be rendered by a browser — perfect for places where you need a quick text representation. For example:

html> title>Ignored Titletitle> body> h1>Hello, World!h1> p>This is some e-mail content. Even though it has whitespace and newlines, the e-mail converter will handle it correctly. p>Even mismatched tags.p> div>A divdiv> div>Another divdiv> div>A divdiv>within a divdiv>div> a href pl-s">http://foo.com">A linka> body> html>
Hello, World! This is some e-mail content. Even though it has whitespace and newlines, the e-mail converter will handle it correctly. Even mismatched tags. A div Another div A div within a div [A link](http://foo.com) 

You can use Composer to add the package to your project:

< "require": < "soundasleep/html2text": "~1.1" > >

And then use it quite simply:

$text = \Soundasleep\Html2Text::convert($html);

You can also include the supplied html2text.php and use $text = convert_html_to_text($html); instead.

Option Default Description
ignore_errors false Set to true to ignore any XML parsing errors.
drop_links false Set to true to not render links as [http://foo.com](My Link) , but rather just My Link .
char_set ‘auto’ Specify a specific character set. Pass multiple character sets (comma separated) to detect encoding, default is ASCII,UTF-8

Pass along options as a second argument to convert , for example:

$options = array( 'ignore_errors' => true, // other options go here ); $text = \Soundasleep\Html2Text::convert($html, $options);

Some very basic tests are provided in the tests/ directory. Run them with composer install && vendor/bin/phpunit .

Class ‘DOMDocument’ not found

You need to install the PHP XML extension for your PHP version. e.g. apt-get install php7.4-xml

html2text is licensed under MIT, making it suitable for both Eclipse and GPL projects.

Also see html2text_ruby, a Ruby implementation.

About

A PHP component to convert HTML into a plain text format

Источник

PHP html_entity_decode() Function

The HTML output of the code above will be (View Source):

The browser output of the code above will be:

Definition and Usage

The html_entity_decode() function converts HTML entities to characters.

The html_entity_decode() function is the opposite of htmlentities().

Syntax

Parameter Values

Parameter Description
string Required. Specifies the string to decode
flags Optional. Specifies how to handle quotes and which document type to use.

The available quote styles are:

  • ENT_COMPAT — Default. Decodes only double quotes
  • ENT_QUOTES — Decodes double and single quotes
  • ENT_NOQUOTES — Does not decode any quotes

Additional flags for specifying the used doctype:

  • ENT_HTML401 — Default. Handle code as HTML 4.01
  • ENT_HTML5 — Handle code as HTML 5
  • ENT_XML1 — Handle code as XML 1
  • ENT_XHTML — Handle code as XHTML
  • UTF-8 — Default. ASCII compatible multi-byte 8-bit Unicode
  • ISO-8859-1 — Western European
  • ISO-8859-15 — Western European (adds the Euro sign + French and Finnish letters missing in ISO-8859-1)
  • cp866 — DOS-specific Cyrillic charset
  • cp1251 — Windows-specific Cyrillic charset
  • cp1252 — Windows specific charset for Western European
  • KOI8-R — Russian
  • BIG5 — Traditional Chinese, mainly used in Taiwan
  • GB2312 — Simplified Chinese, national standard character set
  • BIG5-HKSCS — Big5 with Hong Kong extensions
  • Shift_JIS — Japanese
  • EUC-JP — Japanese
  • MacRoman — Character-set that was used by Mac OS

Note: Unrecognized character-sets will be ignored and replaced by ISO-8859-1 in versions prior to PHP 5.4. As of PHP 5.4, it will be ignored an replaced by UTF-8.

Technical Details

Return Value: Returns the converted string
PHP Version: 4.3.0+
Changelog: PHP 5.6 — Changed the default value for the character-set parameter to the value of the default charset (in configuration).
PHP 5.4 — Changed the default value for the character-set parameter to UTF-8.
PHP 5.4 — Added ENT_HTML401, ENT_HTML5, ENT_XML1 and ENT_XHTML.
PHP 5.0 — Added support for multi-byte encodings

More Examples

Example

Convert some HTML entities to characters:

$str = «Albert Einstein said: 'E=MC²'»;
echo html_entity_decode($str, ENT_COMPAT); // Will only convert double quotes
echo «
«;
echo html_entity_decode($str, ENT_QUOTES); // Converts double and single quotes
echo «
«;
echo html_entity_decode($str, ENT_NOQUOTES); // Does not convert any quotes
?>

The HTML output of the code above will be (View Source):

Albert Einstein said: 'E=MC²'

Albert Einstein said: ‘E=MC²’

Albert Einstein said: 'E=MC²'

The browser output of the code above will be:

Example

Convert some HTML entities to characters, using the Western European character-set:

$str = «My name is Øyvind Åsane. I'm Norwegian.»;
echo html_entity_decode($str, ENT_QUOTES, «UTF-8»);
?>

The HTML output of the code above will be (View Source):

The browser output of the code above will be:

Источник

Оцените статью