Bulk API
How to use libraries from Dewey to access the Amplify API to download data programmatically
Dewey helper libraries
The deweydatapy
and deweydataR
libraries make accessing data via API easy and straight forward. In a few simple steps you can review meta information about the dataset and download the data to your directory.
Create an API Key
On the Dewey platform, select Connections
→ Add Connection
→ API Key
to create your API key.
Save your API Key
Make sure you save a copy of your API key. This will be the only time you will see the key in the platform. If you lose this key, you will need to repeat this process.
Get the product path
On the data product page, select Get / Subscribe
→ Connect to API
and select the API key generated in the previous step. Select Subscribe
and copy the generated Data Endpoint
for retrieval later.
Install the Dewey library
The library can be installed directly from GitHub as a package like this:
pip install deweydatapy@git+https://github.com/Dewey-Data/deweydatapy
library(devtools)
install_github("Dewey-Data/deweydatar")
deweydatapy
and deweydataR
libraries have the following functions:
get_meta
: gets meta information of the datset, especially date range as returned in a dictget_file_list
: gets the list of files in a DataFramedownload_files
: download files from the file list to a destination folderread_sample
: read a sample of data for a file download URL
Using the library
- Import the library and save permissions from the previous steps:
import deweydatapy as ddp
# API Key
apikey = "<API_KEY_STEP_1>"
# Product path
pp_ = "<PRODUCT_PATH_STEP_2>"
library(deweydatar)
# API Key
apikey = "<API_KEY_STEP_1>"
# Product path
pp_ = "<PRODUCT_PATH_STEP_2>"
- Review the meta information of the data product:
meta = ddp.get_meta(apikey, pp_, print_meta = True) # returns a dataframe with meta information
meta = get_meta(apikey, pp_advan_wp, print_meta = TRUE);
Review any partition columns
Date partition columns will be specified in the meta output. You can use this result, as well as the min and max date available, in future sections to partition the data in the API call.
- Collect the list of files to download and store in a dataframe:
files_df = ddp.get_file_list(apikey, pp_,
start_date = 'YYYY-MM-DD',
end_date = 'YYYY-MM-DD',
print_info = True);
files_df = get_file_list(apikey, pp_,
start_date = 'YYYY-MM-DD',
end_date = 'YYYY-MM-DD',
print_info = T)
- Download files from the dataframe:
ddp.download_files(files_df, "<YOUR_DIRECTORY>", skip_exists = True)
download_files(files_df, "<YOUR_DIRECTORY>", skip_exists = TRUE)
If you attempt to download all the files again and want to skip already existing downloaded files, set skip_exists
= TRUE
, the default value is set to FALSE
.
Start working with the data
Now that the data is in the proper directory, you're ready to start your analysis using your tool of choice.
More Information, Pull Requests, Feature Suggestions and Bug Reports
We have an open-source library available on GitHub. Go there to view more information. You can also submit pull requests, feature suggestions and bug reports to support other researchers.
Updated 1 day ago