GitHubFileDownloader¶
- class pyhelpers.ops.GitHubFileDownloader(repo_url, flatten_files=False, output_dir=None)[source]¶
Downloads files from GitHub repositories.
This class facilitates downloading files from a specified GitHub repository URL.
- Parameters:
repo_url (str) – URL of the GitHub repository to download from; it can be a path to a specific blob or tree location.
flatten_files (bool) – Whether to flatten the directory structure by pulling all files into the root folder; defaults to
False
.output_dir (str | None) – Output directory where downloaded files will be saved; defaults to
None
, meaning files will be saved in the current directory.
- Variables:
repo_url (str) – URL of the GitHub repository.
flatten_files (bool) – Whether to flatten the directory structure (i.e. pull the contents of all subdirectories into the root folder); defaults to
False
.output_dir (str | None) – Output directory path; defaults to
None
.api_url (str) – URL of the GitHub repository compatible with GitHub’s REST API.
download_path (str) – Pathname for downloading files.
total_files (int) – Total number of files under the given directory.
Examples:
>>> from pyhelpers.ops import GitHubFileDownloader >>> from pyhelpers.dirs import delete_dir >>> output_dir = "tests/temp" >>> # Download a single file >>> repo_url = 'https://github.com/mikeqfu/pyhelpers/blob/master/tests/data/dat.csv' >>> downloader = GitHubFileDownloader(repo_url, output_dir=output_dir) >>> downloader.download() Downloaded to: ".\tests\temp\dat.csv" 1 >>> # Download a directory >>> repo_url = 'https://github.com/mikeqfu/pyhelpers/blob/master/tests/data' >>> downloader = GitHubFileDownloader(repo_url, output_dir=output_dir) >>> downloader.download() Downloaded to: ".\tests\temp\tests\data\csr_mat.npz" Downloaded to: ".\tests\temp\tests\data\dat.csv" Downloaded to: ".\tests\temp\tests\data\dat.feather" Downloaded to: ".\tests\temp\tests\data\dat.joblib" Downloaded to: ".\tests\temp\tests\data\dat.json" Downloaded to: ".\tests\temp\tests\data\dat.ods" Downloaded to: ".\tests\temp\tests\data\dat.pickle" Downloaded to: ".\tests\temp\tests\data\dat.pickle.bz2" Downloaded to: ".\tests\temp\tests\data\dat.pickle.gz" Downloaded to: ".\tests\temp\tests\data\dat.pickle.xz" Downloaded to: ".\tests\temp\tests\data\dat.txt" Downloaded to: ".\tests\temp\tests\data\dat.xlsx" Downloaded to: ".\tests\temp\tests\data\zipped.7z" Downloaded to: ".\tests\temp\tests\data\zipped.txt" Downloaded to: ".\tests\temp\tests\data\zipped.zip" 15 >>> downloader = GitHubFileDownloader( ... repo_url, flatten_files=True, output_dir=output_dir) >>> downloader.download() Downloaded to: ".\tests\temp\csr_mat.npz" Downloaded to: ".\tests\temp\dat.csv" Downloaded to: ".\tests\temp\dat.feather" Downloaded to: ".\tests\temp\dat.joblib" Downloaded to: ".\tests\temp\dat.json" Downloaded to: ".\tests\temp\dat.ods" Downloaded to: ".\tests\temp\dat.pickle" Downloaded to: ".\tests emp"dat.pickle.bz2" Downloaded to: ".\tests\temp"dat.pickle.gz" Downloaded to: ".\tests\temp\"dat.pickle.xz" Downloaded to: ".\tests\temp\dat.txt" Downloaded to: ".\tests\temp\dat.xlsx" Downloaded to: ".\tests\temp\zipped.7z" Downloaded to: ".\tests\temp\zipped.txt" Downloaded to: ".\tests\temp\zipped.zip" 15 >>> delete_dir(output_dir) To delete the directory ".\tests\temp\" (Not empty) ? [No]|Yes: yes
Methods
check_url
(url)Checks if the scheme of the provided
url
is valid.create_url
(url)Creates a URL compatible with GitHub's REST API from the given URL.
download
([api_url])Downloads files from the specified GitHub
api_url
.download_single_file
(file_url, dir_out)Downloads a single file from the specified
file_url
to thedir_out
directory.