get_dynamic_url¶
- pyhelpers.ops.get_dynamic_url(page_url, pattern, base_url)[source]¶
Gets a dynamic URL matching a specific regex pattern.
This function is primarily used to retrieve direct download links for datasets (e.g. timestamped files) whose filenames may change. It parses the HTML of the
page_urland looks for the first<a>tag with anhrefattribute that matches the providedpattern.- Parameters:
page_url (str) – The URL of the webpage to scrape for links.
pattern (str) – A regular expression pattern to search for within the
hrefattributes.base_url (str) – The base URL used to resolve relative links found on the page.
- Returns:
The fully qualified URL if a match is found;
Noneotherwise.- Return type:
str | None
Examples:
>>> from pyhelpers.ops import get_dynamic_url >>> page_url = "https://northerngasopendataportal.co.uk/datasets/" >>> pattern = ".*\.pdf" >>> base_url = "https://northerngasopendataportal.co.uk" >>> download_url = get_dynamic_url(page_url, pattern, base_url) >>> print(download_url) https://www.northerngasnetworks.co.uk/wp-content/uploads/2018/05/GDPR-Privacy-Statement...