get_introduction

pyrcs.parser.get_introduction(url, delimiter='\n', update=False, verbose=True, raise_error=False)[source]

Gets the introduction section of a specified web page.

This function scrapes the introduction text from the given URL, typically used to summarise data clusters.

Parameters:
  • url (str) – The URL of the web page (usually the main page of a data cluster).

  • delimiter (str) – The delimiter used to separate paragraphs in the returned content; defaults to '\n' (newline).

  • update (bool) – Whether to check for updates to the package data; defaults to False.

  • verbose (bool | int) – Whether to print relevant information to the console; defaults to True.

  • raise_error (bool) – Whether to raise the provided exception; if raise_error=False (default), the error will be suppressed.

Returns:

The introductory text from the web page, formatted with the specified delimiter.

Return type:

str

Examples:

>>> from pyrcs.parser import get_introduction
>>> bridges_url = 'http://www.railwaycodes.org.uk/bridges/bridges0.shtm'
>>> intro_text = get_introduction(url=bridges_url)
>>> intro_text
"There are thousands of bridges over and under the railway system. These pages attempt to...