Bulk API

💡
Be sure to update the dataset URL and API key for existing workflows
While the new platform's API is cross compatible with previous workflows (you can still use the same code structure), we do require that you generate a new API key and use the API URLs as your identifiers.

Dewey helper libraries

The deweydatapy and deweydataR libraries make accessing data via API easy and straight forward. In a few simple steps you can review meta information about the dataset and download the data to your directory.

Create an API Key

On the Dewey platform, select Connections → Add Connection → API Key to create your API key.

💡
Save your API Key
Make sure you save a copy of your API key. This will be the only time you will see the key in the platform. If you lose this key, you will need to repeat this process.

Get the product path

On the data product page, select Get / Subscribe → Connect to API and select the API key generated in the previous step. Select Subscribe and copy the generated Data Endpoint for retrieval later. Do not use the Metadata Endpoint for this step.

Install the Dewey library

The library can be installed directly from GitHub as a package like this:

pip install deweydatapy@git+https://github.com/Dewey-Data/deweydatapy

library(devtools)

install_github("Dewey-Data/deweydatar", ref = "new-platform")

deweydatapyand deweydataR libraries have the following functions:

get_meta: gets meta information of the datset, especially date range as returned in a dict
get_file_list: gets the list of files in a DataFrame
download_files: download files from the file list to a destination folder
read_sample: read a sample of data for a file download URL

Using the library

Import the library and save permissions from the previous steps:

import deweydatapy as ddp

# API Key
apikey = "<API_KEY_STEP_1>"

# Product path
pp_ = "<PRODUCT_PATH_STEP_2>"

library(deweydatar)

# API Key
apikey <- "<API_KEY_STEP_1>"

# Product path
pp_ <- "<PRODUCT_PATH_STEP_2>"

Review the meta information of the data product:

meta = ddp.get_meta(apikey, pp_, print_meta = True) # returns a dataframe with meta information

meta <- get_meta(apikey, pp_, print_meta = TRUE);

💡
Review any partition columns
Date partition columns will be specified in the meta output. You can use this result, as well as the min and max date available, in future sections to partition the data in the API call.

Collect the list of files to download and store in a dataframe:

files_df = ddp.get_file_list(apikey, pp_, 
                             start_date = 'YYYY-MM-DD',
                             end_date = 'YYYY-MM-DD',
                             print_info = True);

files_df <- get_file_list(apikey, pp_, 
                         start_date = 'YYYY-MM-DD',
                         end_date = 'YYYY-MM-DD',
                         print_info = T)

Download files from the dataframe:

ddp.download_files(files_df, "<YOUR_DIRECTORY>", skip_exists = True)

download_files(files_df, "<YOUR_DIRECTORY>", skip_exists = TRUE)

If you attempt to download all the files again and want to skip already existing downloaded files, set skip_exists = TRUE, the default value is set to FALSE.

Start working with the data

Now that the data is in the proper directory, you're ready to start your analysis using your tool of choice.

More Information, Pull Requests, Feature Suggestions and Bug Reports

We have an open-source library available on GitHub. Go there to view more information. You can also submit pull requests, feature suggestions and bug reports to support other researchers.

deweydatapy

deweydataR