GitHubFileDownloader¶
- class pyhelpers.ops.GitHubFileDownloader(repo_url, flatten_files=False, output_dir=None)[source]¶
Downloads files from GitHub repositories.
This class facilitates downloading files from a specified GitHub repository URL.
- Parameters:
repo_url (str) – URL of the GitHub repository to download from; it can be a path to a specific blob or tree location.
flatten_files (bool) – Whether to flatten the directory structure by pulling all files into the root folder; defaults to
False.output_dir (str | os.PathLike | None) – Output directory where downloaded files will be saved; defaults to
None, meaning files will be saved in the current directory.
- Variables:
repo_url (str) – URL of the GitHub repository.
flatten_files (bool) – Whether to flatten the directory structure (i.e. pull the contents of all subdirectories into the root folder); defaults to
False.output_dir (str | None) – Output directory path; defaults to
None.api_url (str) – URL of the GitHub repository compatible with GitHub’s REST API.
download_path (str) – Pathname for downloading files.
total_files (int) – Total number of files under the given directory.
Examples:
>>> from pyhelpers.ops import GitHubFileDownloader >>> from pyhelpers.dirs import delete_dir >>> output_dir = "tests/temp" >>> # Download a single file >>> repo_url = 'https://github.com/mikeqfu/pyhelpers/blob/master/tests/data/dat.csv' >>> downloader = GitHubFileDownloader(repo_url, output_dir=output_dir) >>> downloader.download() Downloaded to: "./tests/temp/dat.csv" 1 >>> # Download a directory >>> repo_url = 'https://github.com/mikeqfu/pyhelpers/blob/master/tests/data' >>> downloader = GitHubFileDownloader(repo_url, output_dir=output_dir) >>> downloader.download() Downloaded to: "./tests/temp/tests/data/csr_mat.npz" Downloaded to: "./tests/temp/tests/data/dat.csv" Downloaded to: "./tests/temp/tests/data/dat.feather" Downloaded to: "./tests/temp/tests/data/dat.joblib" Downloaded to: "./tests/temp/tests/data/dat.json" Downloaded to: "./tests/temp/tests/data/dat.ods" Downloaded to: "./tests/temp/tests/data/dat.pickle" Downloaded to: "./tests/temp/tests/data/dat.pickle.bz2" Downloaded to: "./tests/temp/tests/data/dat.pickle.gz" Downloaded to: "./tests/temp/tests/data/dat.pickle.xz" Downloaded to: "./tests/temp/tests/data/dat.txt" Downloaded to: "./tests/temp/tests/data/dat.xlsx" Downloaded to: "./tests/temp/tests/data/zipped.7z" Downloaded to: "./tests/temp/tests/data/zipped.txt" Downloaded to: "./tests/temp/tests/data/zipped.zip" 15 >>> downloader = GitHubFileDownloader( ... repo_url, flatten_files=True, output_dir=output_dir) >>> downloader.download() Downloaded to: "./tests/temp/csr_mat.npz" Downloaded to: "./tests/temp/dat.csv" Downloaded to: "./tests/temp/dat.feather" Downloaded to: "./tests/temp/dat.joblib" Downloaded to: "./tests/temp/dat.json" Downloaded to: "./tests/temp/dat.ods" Downloaded to: "./tests/temp/dat.pickle" Downloaded to: "./tests/temp/dat.pickle.bz2" Downloaded to: "./tests/temp/dat.pickle.gz" Downloaded to: "./tests/temp/dat.pickle.xz" Downloaded to: "./tests/temp/dat.txt" Downloaded to: "./tests/temp/dat.xlsx" Downloaded to: "./tests/temp/zipped.7z" Downloaded to: "./tests/temp/zipped.txt" Downloaded to: "./tests/temp/zipped.zip" 15 >>> delete_dir(output_dir) To delete the directory "./tests/temp/" (Not empty) ? [No]|Yes: yes
Methods
check_url(url)Checks if the scheme of the provided
urlis valid.create_url(url)Creates a URL compatible with GitHub's REST API from the given URL.
download([api_url])Downloads files from the specified GitHub
api_url.download_single_file(file_url, dir_out)Downloads a single file from the specified
file_urlto thedir_outdirectory.