get_introduction¶
- pyrcs.parser.get_introduction(url, delimiter='\n', update=False, verbose=True, raise_error=False)[source]¶
Gets the introduction section of a specified web page.
This function scrapes the introduction text from the given URL, typically used to summarise data clusters.
- Parameters:
url (str) – The URL of the web page (usually the main page of a data cluster).
delimiter (str) – The delimiter used to separate paragraphs in the returned content; defaults to
'\n'
(newline).update (bool) – Whether to check for updates to the package data; defaults to
False
.verbose (bool | int) – Whether to print relevant information to the console; defaults to
True
.raise_error (bool) – Whether to raise the provided exception; if
raise_error=False
(default), the error will be suppressed.
- Returns:
The introductory text from the web page, formatted with the specified delimiter.
- Return type:
str
Examples:
>>> from pyrcs.parser import get_introduction >>> bridges_url = 'http://www.railwaycodes.org.uk/bridges/bridges0.shtm' >>> intro_text = get_introduction(url=bridges_url) >>> intro_text "There are thousands of bridges over and under the railway system. These pages attempt to...