Bulk API

How to use libraries from Dewey to access the Amplify API to download data programmatically

Dewey helper libraries

The deweydatapy and deweydataR libraries make accessing data via API easy and straight forward. In a few simple steps you can review meta information about the dataset and download the data to your directory.

Create an API Key

On the Dewey platform, select ConnectionsAdd ConnectionAPI Key to create your API key.

💡

Save your API Key

Make sure you save a copy of your API key. This will be the only time you will see the key in the platform. If you lose this key, you will need to repeat this process.

Get the product path

On the data product page, select Get / SubscribeConnect to API and select the API key generated in the previous step. Select Subscribe and copy the generated Data Endpoint for retrieval later.

Install the Dewey library

The library can be installed directly from GitHub as a package like this:

pip install deweydatapy@git+https://github.com/Dewey-Data/deweydatapy
library(devtools)

install_github("Dewey-Data/deweydatar")

deweydatapyand deweydataR libraries have the following functions:

  • get_meta: gets meta information of the datset, especially date range as returned in a dict
  • get_file_list: gets the list of files in a DataFrame
  • download_files: download files from the file list to a destination folder
  • read_sample: read a sample of data for a file download URL

Using the library

  1. Import the library and save permissions from the previous steps:
import deweydatapy as ddp

# API Key
apikey = "<API_KEY_STEP_1>"

# Product path
pp_ = "<PRODUCT_PATH_STEP_2>"
library(deweydatar)

# API Key
apikey = "<API_KEY_STEP_1>"

# Product path
pp_ = "<PRODUCT_PATH_STEP_2>"
  1. Review the meta information of the data product:
meta = ddp.get_meta(apikey, pp_, print_meta = True) # returns a dataframe with meta information
meta = get_meta(apikey, pp_advan_wp, print_meta = TRUE);

💡

Review any partition columns

Date partition columns will be specified in the meta output. You can use this result, as well as the min and max date available, in future sections to partition the data in the API call.

  1. Collect the list of files to download and store in a dataframe:
files_df = ddp.get_file_list(apikey, pp_, 
                             start_date = 'YYYY-MM-DD',
                             end_date = 'YYYY-MM-DD',
                             print_info = True);
files_df = get_file_list(apikey, pp_, 
                         start_date = 'YYYY-MM-DD',
                         end_date = 'YYYY-MM-DD',
                         print_info = T)
  1. Download files from the dataframe:
ddp.download_files(files_df, "<YOUR_DIRECTORY>", skip_exists = True)
download_files(files_df, "<YOUR_DIRECTORY>", skip_exists = TRUE)

If you attempt to download all the files again and want to skip already existing downloaded files, set skip_exists = TRUE, the default value is set to FALSE.

Start working with the data

Now that the data is in the proper directory, you're ready to start your analysis using your tool of choice.

More Information, Pull Requests, Feature Suggestions and Bug Reports

We have an open-source library available on GitHub. Go there to view more information. You can also submit pull requests, feature suggestions and bug reports to support other researchers.

deweydatapy

deweydataR