clean_html_text¶

pyhelpers.text.clean_html_text(input_text)[source]¶

Clean and normalize text extracted from HTML content.

Performs multiple cleaning operations on HTML text including:

Decoding HTML entities (including double-encoded entities)

Converting non-breaking spaces to regular spaces

Removing all HTML tags

Normalising whitespace and trimming the result

Parameters:: input_text (str) – Raw text containing HTML markup and entities.
Returns:: Cleaned text with all HTML artifacts removed and normalised whitespace.
Return type:: str

Examples:

>>> from pyhelpers.text import clean_html_text
>>> clean_html_text('&lt;p&gt;Hello&nbsp;world!&lt;/p&gt;')
'Hello world!'