FAQs - SafeGraph

Does SafeGraph Places contain mobility data?

No. SafeGraph focuses exclusively on POI and transaction data. Advan Research now provides what was formerly SafeGraph mobility data.

Has SafeGraph stopped producing this data or do they make this data available through Advan?

SafeGraph no longer produces Patterns. Advan Research produces their own version of it.

Does Global Places & Geometry data have a longitudinal component?

The Places and Geometry data are not longitudinal datasets. The data goes through a full refresh each month, so the data you’re seeing is a snapshot of POI information as of “today.” Because of the size of this overall dataset, it is delivered as multiple files of the same size, but the files are random.

Safegraph Spend data has some missing entries for `RAW_TOTAL_SPEND` for a few months in 2023 for particular `PLACEKEYS`. Does that mean the value of `RAW_TOTAL_SPEND` is 0 for that month or that no information is available for that `PLACEKEY` in that month?

You can review this privacy statement from SafeGraph. They don’t report on Spend if there was only one observation. So you can’t rule out that there was a transaction but it wasn’t reported. For the same reason, they also won’t ever report 0 as the spend, it will just remain blank.

How can I map geographic data on Dewey, like Places, to Census Block Group, Census Tract, CBSA, etc.

Here is a sample Python code for mapping Dewey Data to Census Block Group, Census Tract, CBSA, etc.

In the SafeGraph Spend Panel, some states show sudden increases in total number, which I can attribute to the addition of data sources. However, in other states, nearly half of the original data is lost. Is there a reason why the data sources disappear in certain states?

This panel is made up from data from a variety of financial institutions. It’s common and expected for institutions to be introduced and dropped from the panel over time. Depending on the adoption of those institutions regionally, it may impact various geographies differently over time.

How can I join SafeGraph Brand data to public company unique identifiers like those available on Compustat?

Here is a notebook for joining SafeGraph Brand data to common company identifiers like CIK and TIC. As a researcher, there is a prerequisite to have WRDS API access.

The `opened_on` and `closed_on` variables have a low fill rate. Why is that?

You can read up on open/closed on logic from SafeGraph here. This is a column they’ve only started tracking in the past few years. In theory, it should cover closures in the past few years. Based on how they generate their data at scale, some brands are going to be easier to track than others, some will be delayed a few months. They may eventually backfill data once they find a better resource.

How should outliers be handled in the Spend dataset?

In the past, SafeGraph has suggested filtering out outliers beyond a certain min/max threshold, depending on the use case. This will help remove locations where coverage is too low to consider reliable.

One thing to consider is the fluctuation in panel size, so it’s also worth normalizing the data against the number of transactions and customers.

Here is a notebook that was previously published on sampling bias which might be helpful to review. It's also a helpful resource for investigating geographic bias in the data.

What does the `customer_home_city` refer to in the Spend dataset? If I want to do some spatial analyses on consumption behaviors, where can I get the boundaries of these areas? Or is there anywhere I could find spend pattern data with customers’ home location at the census tract/census block level?

The attribute definition is “The number of customers to the POI based on the customer’s estimated home location. Homes are indicated by unique city and state pairs.”

Each customer in the panel is classified into an income class using a proprietary model based on his or her transactions and spending data.
Similarly, each customer’s home city and state are estimated using a proprietary model based on where the user makes the majority of their transactions.
Note that SafeGraph does not provide any individual-level data in this dataset, and these models are used solely for aggregating demographic information about customers to points of interest. A reminder that both of these columns are subject to differential privacy, implemented specifically to remove the possibility of identifying individuals with this data.

Is the source of the Spend dataset panel stable or are more sources added/removed as time goes on? For example, the number of transactions in a specific POI doubles in 2020 compared to 2019.

The panel will always slightly change over time, which is why they post a Spend Transaction Panel that you can use to normalize the data.

Do the same consumers in the Spend data appear each month, or are consumers added and dropped each month?

Access to any individual-level data to identify whether an individual user is added, dropped, or already existing each month is not available, but intuitively we would strongly suspect the vast majority of the panel remains the same each month. There will certainly be consumers added and dropped each month based on how the Spend data is built, so the panel may change over time.

I noticed that SafeGraph says "it may be necessary to adjust Spend data based on the state sampling rate each month, in order to account for the variations in state sampling rate over time for some states.” Do you have any suggestions on how to do that?

Generally, the recommended approach is to use the Panel Overview Data and Census population data to normalize the raw spend totals for each state each month.

Will tourists show up in the data?

The panel is sourced from US financial institutions. If a non-resident has opened a card at a US-institution, they could be included in the panel. However, it is unlikely a tourist on a short-term trip would be included in the data.

Why is there no information about UberEats in the `transaction_intermediary` variable?

This absence is due to limitations in the available transaction metadata. For a transaction to be linked to a transaction_intermediaries variable and attributed to a specific Point of Interest (POI), the transaction must contain both a merchant and an intermediary. For example, in a transaction like "SQ * Chipotle #257", Square serves as the intermediary and Chipotle is the identifiable POI.

However, UberEats transactions typically lack the necessary detail—specifically, they do not include the name of the underlying merchant from which the order originated. These transactions usually appear as generic UberEats charges (e.g., "Uber Eats debit card purchase") without referencing the restaurant or store involved. As a result, there’s no way to reliably associate them with a POI, and they are excluded from the transaction_intermediaries mapping.

What are Point POI?

Some places are small and not well defined by a geometric shape. SafeGraph refer to these places as "Point POIs" and intentionally do not offer a polygon. Places like transit stops (naics_code like "485%", ATMs (naics_code = 522110, 522320, 522130), kiosks (naics_code = 446110, 532282, 443142), and electric vehicle charging stations (naics_code = 447190) are examples of Point POIs found in our data, and we flag these by setting the geometry_type column = "POINT."

Is there geographic bias in the Spend data?

If you'd like to learn more about the representitiveness of the Spend data, you can use this notebook:

https://colab.research.google.com/drive/16eBT1dqnUK76GnBMgPMY5rJO4vBD2gdQ?usp=sharing