get_catalogue

pyrcs.parser.get_catalogue(url, update=False, json_it=True, verbose=False, raise_error=False)[source]

Gets the catalogue of items from the main page of a data cluster.

This function scrapes a catalogue of entries (typically hyperlinks) from a specified URL. It offers the option to save the catalogue as a JSON file.

Parameters:
  • url (str) – The URL of the main page of a data cluster.

  • update (bool) – Whether to check for updates to the package data; defaults to False.

  • json_it (bool) – Whether to save the catalogue as a JSON file; defaults to True.

  • verbose (bool | int) – Whether to print relevant information to the console; defaults to False.

  • raise_error (bool) – Whether to raise the provided exception; if raise_error=False (default), the error will be suppressed.

Returns:

The catalogue in the form of a dictionary, where keys are entry titles and values are URLs, or None if the operation is unsuccessful.

Return type:

dict | None

Examples:

>>> from pyrcs.parser import get_catalogue
>>> elr_cat = get_catalogue(url='http://www.railwaycodes.org.uk/elrs/elr0.shtm')
>>> type(elr_cat)
dict
>>> list(elr_cat.keys())[:5]
['Introduction', 'A', 'B', 'C', 'D']
>>> list(elr_cat.keys())[-5:]
['Lines without codes',
 'ELR/LOR converter',
 'LUL system',
 'DLR system',
 'Canals']
>>> location_code_cat = get_catalogue(url='http://www.railwaycodes.org.uk/crs/crs0.shtm')
>>> type(location_code_cat)
dict
>>> list(location_code_cat.keys())[:5]
['Introduction', 'A', 'B', 'C', 'D']
>>> list(location_code_cat.keys())[-5:]
['W', 'X', 'Y', 'Z', 'Other systems']