GitHubFileDownloader

class pyhelpers.ops.GitHubFileDownloader(repo_url, flatten_files=False, output_dir=None)[source]

Downloads files from GitHub repositories.

This class facilitates downloading files from a specified GitHub repository URL.

Parameters:
  • repo_url (str) – URL of the GitHub repository to download from; it can be a path to a specific blob or tree location.

  • flatten_files (bool) – Whether to flatten the directory structure by pulling all files into the root folder; defaults to False.

  • output_dir (str | None) – Output directory where downloaded files will be saved; defaults to None, meaning files will be saved in the current directory.

Variables:
  • repo_url (str) – URL of the GitHub repository.

  • flatten_files (bool) – Whether to flatten the directory structure (i.e. pull the contents of all subdirectories into the root folder); defaults to False.

  • output_dir (str | None) – Output directory path; defaults to None.

  • api_url (str) – URL of the GitHub repository compatible with GitHub’s REST API.

  • download_path (str) – Pathname for downloading files.

  • total_files (int) – Total number of files under the given directory.

Examples:

>>> from pyhelpers.ops import GitHubFileDownloader
>>> from pyhelpers.dirs import delete_dir
>>> output_dir = "tests/temp"
>>> # Download a single file
>>> repo_url = 'https://github.com/mikeqfu/pyhelpers/blob/master/tests/data/dat.csv'
>>> downloader = GitHubFileDownloader(repo_url, output_dir=output_dir)
>>> downloader.download()
Downloaded to: ".\tests\temp\dat.csv"
1
>>> # Download a directory
>>> repo_url = 'https://github.com/mikeqfu/pyhelpers/blob/master/tests/data'
>>> downloader = GitHubFileDownloader(repo_url, output_dir=output_dir)
>>> downloader.download()
Downloaded to: ".\tests\temp\tests\data\csr_mat.npz"
Downloaded to: ".\tests\temp\tests\data\dat.csv"
Downloaded to: ".\tests\temp\tests\data\dat.feather"
Downloaded to: ".\tests\temp\tests\data\dat.joblib"
Downloaded to: ".\tests\temp\tests\data\dat.json"
Downloaded to: ".\tests\temp\tests\data\dat.ods"
Downloaded to: ".\tests\temp\tests\data\dat.pickle"
Downloaded to: ".\tests\temp\tests\data\dat.pickle.bz2"
Downloaded to: ".\tests\temp\tests\data\dat.pickle.gz"
Downloaded to: ".\tests\temp\tests\data\dat.pickle.xz"
Downloaded to: ".\tests\temp\tests\data\dat.txt"
Downloaded to: ".\tests\temp\tests\data\dat.xlsx"
Downloaded to: ".\tests\temp\tests\data\zipped.7z"
Downloaded to: ".\tests\temp\tests\data\zipped.txt"
Downloaded to: ".\tests\temp\tests\data\zipped.zip"
15
>>> downloader = GitHubFileDownloader(
...     repo_url, flatten_files=True, output_dir=output_dir)
>>> downloader.download()
Downloaded to: ".\tests\temp\csr_mat.npz"
Downloaded to: ".\tests\temp\dat.csv"
Downloaded to: ".\tests\temp\dat.feather"
Downloaded to: ".\tests\temp\dat.joblib"
Downloaded to: ".\tests\temp\dat.json"
Downloaded to: ".\tests\temp\dat.ods"
Downloaded to: ".\tests\temp\dat.pickle"
Downloaded to: ".\tests     emp"dat.pickle.bz2"
Downloaded to: ".\tests\temp"dat.pickle.gz"
Downloaded to: ".\tests\temp\"dat.pickle.xz"
Downloaded to: ".\tests\temp\dat.txt"
Downloaded to: ".\tests\temp\dat.xlsx"
Downloaded to: ".\tests\temp\zipped.7z"
Downloaded to: ".\tests\temp\zipped.txt"
Downloaded to: ".\tests\temp\zipped.zip"
15
>>> delete_dir(output_dir)
To delete the directory ".\tests\temp\" (Not empty)
? [No]|Yes: yes

Methods

check_url(url)

Checks if the scheme of the provided url is valid.

create_url(url)

Creates a URL compatible with GitHub's REST API from the given URL.

download([api_url])

Downloads files from the specified GitHub api_url.

download_single_file(file_url, dir_out)

Downloads a single file from the specified file_url to the dir_out directory.