get_catalogue¶
- pyrcs.parser.get_catalogue(url, update=False, json_it=True, verbose=False, raise_error=False)[source]¶
Gets the catalogue of items from the main page of a data cluster.
This function scrapes a catalogue of entries (typically hyperlinks) from a specified URL. It offers the option to save the catalogue as a JSON file.
- Parameters:
url (str) – The URL of the main page of a data cluster.
update (bool) – Whether to check for updates to the package data; defaults to
False
.json_it (bool) – Whether to save the catalogue as a JSON file; defaults to
True
.verbose (bool | int) – Whether to print relevant information to the console; defaults to
False
.raise_error (bool) – Whether to raise the provided exception; if
raise_error=False
(default), the error will be suppressed.
- Returns:
The catalogue in the form of a dictionary, where keys are entry titles and values are URLs, or
None
if the operation is unsuccessful.- Return type:
dict | None
Examples:
>>> from pyrcs.parser import get_catalogue >>> elr_cat = get_catalogue(url='http://www.railwaycodes.org.uk/elrs/elr0.shtm') >>> type(elr_cat) dict >>> list(elr_cat.keys())[:5] ['Introduction', 'A', 'B', 'C', 'D'] >>> list(elr_cat.keys())[-5:] ['Lines without codes', 'ELR/LOR converter', 'LUL system', 'DLR system', 'Canals'] >>> location_code_cat = get_catalogue(url='http://www.railwaycodes.org.uk/crs/crs0.shtm') >>> type(location_code_cat) dict >>> list(location_code_cat.keys())[:5] ['Introduction', 'A', 'B', 'C', 'D'] >>> list(location_code_cat.keys())[-5:] ['W', 'X', 'Y', 'Z', 'Other systems']