Html encoding with javascript

How to HTML-encode a String

This tutorial provides some methods that are used for HTML-encoding a string without an XSS vulnerability.

Here is an example which somehow reduces the XSS chance:

html> html> head> title>Title of the document title> head> body> div id="encoded"> div> div id="decoded"> div> script> let string1 = "Html & Css & Javascript"; let string2 = "Html & Css & Javascript"; function htmlDecode(input) < const textArea = document.createElement("textarea"); textArea.innerHTML = input; return textArea.value; > function htmlEncode(input) < const textArea = document.createElement("textarea"); textArea.innerText = input; return textArea.innerHTML.split("
"
).join("\n"); > document.getElementById("encoded").innerText = htmlEncode(string1); document.getElementById("decoded").innerText = htmlDecode(string2);
script> body> html>

On the htmlEncode function the innerText of the element is set, and the encoded innerHTML is retrieved. The innerHTML value of the element is set on the htmlDecode function the innerText is retrieved.

In the following html code, we use the functions we have defined to convert a user input in a textarea, and encode it to prevent XSS.

html> html> body> textarea rows="6" cols="50" name="normalTXT" id="textId"> textarea> button onclick="convert()">Convert button> br /> URL> Encoding in URL: input width="500" type="text" name="URL-ENCODE" id="URL-ENCODE" /> br /> URL> html> Encoding in HTML: input type="text" name="HTML-ENCODE" id="HTML-ENCODE" /> br /> html> script> function htmlDecode(input) < const textArea = document.createElement("textarea"); textArea.innerHTML = input; return textArea.value; > function htmlEncode(input) < const textArea = document.createElement("textarea"); textArea.innerText = input; return textArea.innerHTML.split("
"
).join("\n"); > function convert( ) < const textArea = document.getElementById("textId"); const HTMLencoded = textArea.value; document.getElementById("HTML-ENCODE").value = HTMLencoded; const urlEncode = htmlEncode(textArea.value); document.getElementById("URL-ENCODE").value = urlEncode; >
script> body> html>

This method will work fine in many scenarios, but in some cases, you will end up with a XSS vulnerability.

For the function above, consider the following string:

htmlDecode("");

The string contains an unescaped HTML tag, so instead of decoding the htmlDecode function will run JavaScript code specified inside the string. To avoid this you can use DOMParser which is supported in all major browsers:

w3docs logo

Javascript decoding the HTML

function htmlDecode(input) < let doc = new DOMParser().parseFromString(input, "text/html"); return doc.documentElement.textContent; >alert(htmlDecode(«<img src=’img.jpg’>»)); // «» alert(htmlDecode(««)); // «»

Читайте также:  Eclipse java path program files

The function won’t run any JavaScript code as a side-effect. Any HTML tag will be ignored as the text content only will be returned.

Another useful and fast method exists which also encodes quote marks:

function htmlEscape(str) < return str .replace(/&/g, '&') .replace(/'/g, '&apos') .replace(/"/g, '"') .replace(/>/g, '>') .replace(/, '<'); > // The opposite function: function htmlUnescape(str) < return str .replace(/&/g, '&') .replace(/&apos/g, "'") .replace(/"/g, '"') .replace(/>/g, '>') .replace(/</g, '); >

To escape forward-slash / for anti-XSS safety purposes use the following:

The replace() Method

The replace() RegExp method replaces the specified string with another string. The method takes two parameters the first one is the string that should be replaced, and the second one is the string replacing from the first string. The second string can be given an empty string so that the text to be replaced is removed.

Источник

Encode HTML With JavaScript

Encode HTML With JavaScript

  1. Encode HTML With String Replacement in JavaScript
  2. Encode HTML With the charCodeAt Function in JavaScript
  3. Encode HTML With createTextNode in JavaScript
  4. Encode HTML With He.js in JavaScript

This article will introduce how to encode an HTML string in JavaScript. We’ll use four different methods, which have string replacement in common.

The purpose of the string replacement is to replace potentially dangerous characters.

Encode HTML With String Replacement in JavaScript

The replace() method takes a pattern and a replacement as an argument and matches based on the pattern. Let’s have a look at an example to see how this works.

In our example code below, we define a function that will take an HTML string as an argument. This function will return the encoded HTML.

function htmlEncode(string)   return string.replace(/&/g, '&')  .replace(/, '<')  .replace(/>/g, '>')  .replace(/'/g, ''')  .replace(/"/g, '"')  .replace(/\//, '/'); >  console.log(htmlEncode("

Hello "));

Encode HTML With the charCodeAt Function in JavaScript

The charCodeAt method returns an integer representing the UTF-16 code unit at an index. This makes it perfect for encoding some characters in your HTML.

We present an example code below where we’ve defined a function that takes a string as an argument.

We set up an array that acts as a buffer in the function definition. So, we loop through the array with a typical for loop.

During the loop, we use the unshift method to add some characters to the beginning of the array. These characters combine with &# , the integer returned by charCodeAt , and a semicolon.

We join them with the join function during the loop and when the function returns.

function encodeWithCharCode(string)   let buffer = [];   for (let i = string.length-1; i>= 0; i--)   buffer.unshift(['&#', string[i].charCodeAt(), ';'].join(''));  >   return buffer.join(''); >  console.log(encodeWithCharCode("

Hello world

"
));

Encode HTML With createTextNode in JavaScript

You can use the createTextNode method to encode a given HTML. The method accepts a string as an argument that it encodes behind the scenes.

Afterward, you can grab this encoded data. You can do all this with procedural programming or a function.

The function will accept an HTML string as an argument. Afterward, it creates an element with createElement and a text node with createTextNode .

In the end, the function appends this text node as the child of the created element and returns it via innerHTML . All along, createTextNode encodes the created text.

function encodeWithTextNode(htmlstring)   let textarea = document.createElement('textarea');  let text = document.createTextNode(htmlstring);  textarea.appendChild(text);  return textarea.innerHTML; >  console.log(encodeWithTextNode("

Hello "));

Encode HTML With He.js in JavaScript

He.js is an open-source entity encoder and decoder created by Mathias Bynens. To get started with the He.js library, visit the he.js GitHub repository.

Once you are there, select your preferred download option. You can embed He.js in your HTML code as an alternative option.

All you have to do is visit the cdnjs page for he.js and grab your preferred CDN link.

The next code block shows how to encode an HTML string with he.js .

body>  script  src="https://cdnjs.cloudflare.com/ajax/libs/he/1.2.0/he.min.js"  integrity="sha512-PEsccDx9jqX6Dh4wZDCnWMaIO3gAaU0j46W//sSqQhUQxky6/eHZyeB3NrXD2xsyugAKd4KPiDANkcuoEa2JuA=="  crossorigin="anonymous"  referrerpolicy="no-referrer">  script>  script type="text/javascript">  console.log(he.encode("

Hello ")); script> body>

Habdul Hazeez is a technical writer with amazing research skills. He can connect the dots, and make sense of data that are scattered across different media.

Related Article — JavaScript HTML

Copyright © 2023. All right reserved

Источник

Encode and Decode HTML entities using pure Javascript

Carlos Delgado

Learn how to encode and decode to html entities a string using javascript.

Invalid html, broked markup and other undesirable side-effects of work with html strings without being escaped properly in Javascript, is a problem that at least 1 of every 5 web developers (that works with dynamic apps) have faced.

Javascript itself doesn’t provide native methods to deal with it, unlike PHP (our beautiful server side language) which offers the htmlentities , html_entity_decode and html_entity_encode functions ready to use.

Encode and decode everything

If you’re one of those psychotic (just like me) developers that doesn’t like to add huge portion of code in their projects, you may want to use the following snippet.

This piece of code works like a charm in both ways, encode and decode. It expects as first parameter the string (decoded or encoded acording to the method) and returns the processed string.

It doesn’t provide too much customization but it works fine (at less to have only a couple of lines). Note that the encode method, will convert every single character into its html character.

If you want to replace only those weird characters that broke your html (,/,\ etc) keep reading and don’t use this method, otherwise this snippet comes in handy.

(function(window) < window.htmlentities = < /** * Converts a string to its html characters completely. * * @param str String with unescaped HTML characters **/ encode : function(str) < var buf = []; for (var i=str.length-1;i>=0;i--) < buf.unshift(['&#', str[i].charCodeAt(), ';'].join('')); >return buf.join(''); >, /** * Converts an html characterSet into its original character. * * @param str htmlSet entities **/ decode : function(str) < return str.replace(/&#(\d+);/g, function(match, dec) < return String.fromCharCode(dec); >); > >; >)(window);

The previous code creates a global variable (in the window) named htmlentities. This object contains the 2 methods encode and decode.

To convert a normal string to its html characters use the encode method :

htmlentities.encode("Hello, this is a test stríng > < with characters that could break html. Therefore we convert it to its html characters."); // Output "Hello, this is a test stríng > < with characters that could break html. Therefore we convert it to its html characters."

To convert an encoded html string to readable characters, use the decode method :

htmlentities.decode("Hello, this is a test stríng > < with characters that could break html. Therefore we convert it to its html characters."); // Output "Hello, this is a test stríng > < with characters that could break html. Therefore we convert it to its html characters."

Note : feel free to copy every single function and include it in your project as you wish.

Using a library

As a task that is not easy to achieve, there is an awesome library that will solve this issue for you.

He.js (for “HTML entities”) is a robust HTML entity encoder/decoder written in JavaScript. It supports all standardized named character references as per HTML , handles ambiguous ampersands and other edge cases just like a browser would , has an extensive test suite, and contrary to many other JavaScript solutions, he handles astral Unicode symbols just fine. An online demo is available.

Encode

This function takes a string of text and encodes (by default) any symbols that aren’t printable ASCII symbols and & , < , >, " , ' , and ` , replacing them with character references.

// Using the global default setting (defaults to `false`): he.encode('foo © bar ≠ baz . qux'); // → 'foo © bar ≠ baz 𝌆 qux' // Passing an `options` object to `encode`, to explicitly encode all symbols: he.encode('foo © bar ≠ baz . qux', < 'encodeEverything': true >); // → 'foo © bar ≠ baz 𝌆 qux' // This setting can be combined with the `useNamedReferences` option: he.encode('foo © bar ≠ baz . qux', < 'encodeEverything': true, 'useNamedReferences': true >); // → 'foo © bar ≠ baz 𝌆 qux'

Decode

This function takes a string of HTML and decodes any named and numerical character references in it using the algorithm described in section 12.2.4.69 of the HTML spec .

he.decode('foo © bar ≠ baz 𝌆 qux'); // → 'foo © bar ≠ baz . qux'

Источник

Оцените статью