clean_html_text¶
- pyhelpers.text.clean_html_text(input_text)[source]¶
Clean and normalize text extracted from HTML content.
Performs multiple cleaning operations on HTML text including:
Decoding HTML entities (including double-encoded entities)
Converting non-breaking spaces to regular spaces
Removing all HTML tags
Normalising whitespace and trimming the result
- Parameters:
input_text (str) – Raw text containing HTML markup and entities.
- Returns:
Cleaned text with all HTML artifacts removed and normalised whitespace.
- Return type:
str
Examples:
>>> from pyhelpers.text import clean_html_text >>> clean_html_text('<p>Hello world!</p>') 'Hello world!'