download_file_from_url

pyhelpers.ops.download_file_from_url(url, path_to_file, if_exists='replace', max_retries=5, requests_session_args=None, requests_headers=None, verbose=False, print_wrap_limit=None, total_records=None, chunk_multiplier=1, pbar_desc=None, pbar_format=None, pbar_color=None, validate=True, stream_download=False, **kwargs)[source]

Downloads a file from a valid URL.

The function uses the requests library and optionally tqdm for progress bar display.

See also [OPS-DFFU-1] and [OPS-DFFU-2].

Parameters:
  • url (str) – Valid URL pointing to a web resource.

  • path_to_file (str | os.PathLike[str]) – Path where the downloaded file will be saved; it can be either a full path with filename or just a filename, in which case it will be saved in the current working directory.

  • if_exists (str) – Action to take if the specified file already exists; options include 'replace' (default - download and replace the existing file) and 'pass' (cancel the download and return None).

  • max_retries (int) – Maximum number of retries in case of download failures; defaults to 5.

  • requests_session_args (dict | None) – [Optional] Additional parameters for initializing the requests session (e.g., proxies, verify); defaults to None.

  • requests_headers (dict | None) – [Optional] Custom headers to be included in the HTTP request. A default ‘User-Agent’ is automatically generated unless overridden.

  • verbose (bool | int) – Whether to print progress and relevant information to the console; defaults to False. If True, a tqdm progress bar is displayed.

  • print_wrap_limit (int | None) – Maximum length of the string before splitting into two lines; defaults to None, which disables splitting. If the string exceeds this value, e.g. 100, it will be split at (before) state_prep to improve readability when printed.

  • total_records (int | None) – The expected number of records (rows) in the dataset, used for progress tracking when the response’s Content-Length header is unavailable; defaults to None.

  • chunk_multiplier (int | float) – A factor by which the default chunk size (1MB) is multiplied; this can be adjusted to optimize download performance based on file size; defaults to 1.

  • pbar_desc (str | None) – Custom description for the progress bar; when desc=None, it defaults to the filename.

  • pbar_format (str | None) – Custom format for the progress bar.

  • pbar_color (str | None) – Custom color of the progress bar (e.g. ‘green’, ‘yellow’); defaults to None.

  • validate (bool) – Whether to validate if the downloaded file is non-empty after download. Validation checks if the final file size is greater than 0; defaults to True.

  • stream_download (bool) – When stream_download=True, use streaming download (memory-efficient, preferred for large files or when verbose=False); When stream_download=False, the entire file content is loaded into memory first (simpler/faster for small files); defaults to False.

  • kwargs – [Optional] Additional parameters passed to the method tqdm.tqdm().

Returns:

None upon successful completion or if the download is skipped (i.e., when if_exists='pass' and the file already exists).

Return type:

None

Raises:

ValueError – If validate=True and the downloaded file size is zero.

Examples:

>>> from pyhelpers.ops import download_file_from_url
>>> from pyhelpers.dirs import cd
>>> from PIL import Image
>>> import os
>>> url = 'https://www.python.org/static/community_logos/python-logo-master-v3-TM.png'
>>> path_to_img = cd("tests", "images", "ops-download_file_from_url-demo.png")
>>> # Check if "python-logo.png" exists at the specified path
>>> os.path.exists(path_to_img)
False
>>> # Download the .png file
>>> download_file_from_url(url, path_to_img, verbose=True, pbar_color='green')
Downloading "ops-download_file_from_url-demo.png" 100%|██████████| 83.6k/83.6k | ...
    Saving "ops-download_file_from_url-demo.png" to "./tests/images/" ... Done.
>>> # If download is successful, check again:
>>> os.path.exists(path_to_img)
True
>>> img = Image.open(path_to_img)
>>> img.show()  # as illustrated below
../_images/ops-download_file_from_url-demo.svg

Figure 8 The Python Logo.


Note

  • When verbose=True, the function requires tqdm.

  • The function handles HTTP retries internally using a requests session adapter.