download_file_from_url¶
- pyhelpers.ops.download_file_from_url(url, path_to_file, if_exists='replace', max_retries=5, requests_session_args=None, requests_headers=None, verbose=False, print_wrap_limit=None, total_records=None, chunk_multiplier=1, pbar_desc=None, pbar_format=None, pbar_color=None, validate=True, stream_download=False, **kwargs)[source]¶
Downloads a file from a valid URL.
The function uses the requests library and optionally tqdm for progress bar display.
See also [OPS-DFFU-1] and [OPS-DFFU-2].
- Parameters:
url (str) – Valid URL pointing to a web resource.
path_to_file (str | os.PathLike[str]) – Path where the downloaded file will be saved; it can be either a full path with filename or just a filename, in which case it will be saved in the current working directory.
if_exists (str) – Action to take if the specified file already exists; options include
'replace'(default - download and replace the existing file) and'pass'(cancel the download and returnNone).max_retries (int) – Maximum number of retries in case of download failures; defaults to
5.requests_session_args (dict | None) – [Optional] Additional parameters for initializing the requests session (e.g., proxies, verify); defaults to
None.requests_headers (dict | None) – [Optional] Custom headers to be included in the HTTP request. A default ‘User-Agent’ is automatically generated unless overridden.
verbose (bool | int) – Whether to print progress and relevant information to the console; defaults to
False. IfTrue, a tqdm progress bar is displayed.print_wrap_limit (int | None) – Maximum length of the string before splitting into two lines; defaults to
None, which disables splitting. If the string exceeds this value, e.g.100, it will be split at (before)state_prepto improve readability when printed.total_records (int | None) – The expected number of records (rows) in the dataset, used for progress tracking when the response’s
Content-Lengthheader is unavailable; defaults toNone.chunk_multiplier (int | float) – A factor by which the default chunk size (1MB) is multiplied; this can be adjusted to optimize download performance based on file size; defaults to
1.pbar_desc (str | None) – Custom description for the progress bar; when
desc=None, it defaults to the filename.pbar_format (str | None) – Custom format for the progress bar.
pbar_color (str | None) – Custom color of the progress bar (e.g. ‘green’, ‘yellow’); defaults to
None.validate (bool) – Whether to validate if the downloaded file is non-empty after download. Validation checks if the final file size is greater than 0; defaults to
True.stream_download (bool) – When stream_download=True, use streaming download (memory-efficient, preferred for large files or when verbose=False); When stream_download=False, the entire file content is loaded into memory first (simpler/faster for small files); defaults to
False.kwargs – [Optional] Additional parameters passed to the method tqdm.tqdm().
- Returns:
Noneupon successful completion or if the download is skipped (i.e., whenif_exists='pass'and the file already exists).- Return type:
None
- Raises:
ValueError – If
validate=Trueand the downloaded file size is zero.
Examples:
>>> from pyhelpers.ops import download_file_from_url >>> from pyhelpers.dirs import cd >>> from PIL import Image >>> import os >>> url = 'https://www.python.org/static/community_logos/python-logo-master-v3-TM.png' >>> path_to_img = cd("tests", "images", "ops-download_file_from_url-demo.png") >>> # Check if "python-logo.png" exists at the specified path >>> os.path.exists(path_to_img) False >>> # Download the .png file >>> download_file_from_url(url, path_to_img, verbose=True, pbar_color='green') Downloading "ops-download_file_from_url-demo.png" 100%|██████████| 83.6k/83.6k | ... Saving "ops-download_file_from_url-demo.png" to "./tests/images/" ... Done. >>> # If download is successful, check again: >>> os.path.exists(path_to_img) True >>> img = Image.open(path_to_img) >>> img.show() # as illustrated below
Figure 8 The Python Logo.¶
Note
When
verbose=True, the function requires tqdm.The function handles HTTP retries internally using a requests session adapter.