clean_html_text

pyhelpers.text.clean_html_text(input_text)[source]

Clean and normalize text extracted from HTML content.

Performs multiple cleaning operations on HTML text including:

  • Decoding HTML entities (including double-encoded entities)

  • Converting non-breaking spaces to regular spaces

  • Removing all HTML tags

  • Normalising whitespace and trimming the result

Parameters:

input_text (str) – Raw text containing HTML markup and entities.

Returns:

Cleaned text with all HTML artifacts removed and normalised whitespace.

Return type:

str

Examples:

>>> from pyhelpers.text import clean_html_text
>>> clean_html_text('<p>Hello world!</p>')
'Hello world!'