Stripping HTML is the process of removing every HTML element from HTML code and keeping only the text content inside instead.
For example, stripping HTML tags from the HTML code below will result in the text in the following block. In this case, the
h2 tags are completely removed. Therefore, you get the raw text as a result.
<h1>HTML Stripper</h1> <h2>You can easily strip all the HTML tags using HTML Stripper</h2>
HTML Stripper You can easily strip all the HTML tags using HTML Stripper
You can completely strip HTML tags from HTML code programmatically using regular expression assuming the text in input HTML code is safely escaped; i.e. no
> characters inside any HTML elements.
replace method. The similar regular expression can be used in other programming languages as well.
const html = '<h1>HTML Stripper</h1>'; // Replace everything matching an HTML element with an empty string as known as stripping it. const text = html.replace(/<[^>]*>/g, ''); console.log(text); // HTML Stripper
Sometimes, stripped text can contain HTML entities which represent HTML special characters as known as reserved characters. An HTML entity begins with an ampersand
& and ends with a semicolon
;. For example,
© is the HTML entity of the copyright symbol
const he = require('he'); const text = 'The Euro (€) is the currency of the EU countries.'; // Decode the HTML entities in the text using the decode method from the he library. const decodedText = he.decode(text); console.log(decodedText); // The Euro (€) is the currency of the EU countries.