FAQs - SafeGraph

Does SafeGraph Places contain mobility data?

No. SafeGraph focuses exclusively on POI and transaction data. Advan Research now provides what was formerly SafeGraph mobility data.

Has SafeGraph stopped producing this data or do they make this data available through Advan?

SafeGraph no longer produces Patterns. Advan Research produces their own version of it.

Does Global Places & Geometry data have a longitudinal component?

The Places and Geometry data are not longitudinal datasets. The data goes through a full refresh each month, so the data you’re seeing is a snapshot of POI information as of “today.” Because of the size of this overall dataset, it is delivered as multiple files of the same size, but the files are random.

Safegraph Spend data has some missing entries for RAW_TOTAL_SPEND for a few months in 2023 for particular PLACEKEYS. Does that mean the value of RAW_TOTAL_SPEND is 0 for that month or that no information is available for that PLACEKEY in that month?

You can review this privacy statement from SafeGraph. They don’t report on Spend if there was only one observation. So you can’t rule out that there was a transaction but it wasn’t reported. For the same reason, they also won’t ever report 0 as the spend, it will just remain blank.

How can I map geographic data on Dewey, like Places, to Census Block Group, Census Tract, CBSA, etc.

Here is a sample Python code for mapping Dewey Data to Census Block Group, Census Tract, CBSA, etc.

In the SafeGraph Spend Panel, some states show sudden increases in total number, which I can attribute to the addition of data sources. However, in other states, nearly half of the original data is lost. Is there a reason why the data sources disappear in certain states?

This panel is made up from data from a variety of financial institutions. It’s common and expected for institutions to be introduced and dropped from the panel over time. Depending on the adoption of those institutions regionally, it may impact various geographies differently over time.

The opened_on and closed_on variables have a low fill rate. Why is that?

You can read up on open/closed on logic from SafeGraph here. This is a column they’ve only started tracking in the past few years. In theory, it should cover closures in the past few years. Based on how they generate their data at scale, some brands are going to be easier to track than others, some will be delayed a few months. They may eventually backfill data once they find a better resource.

How should outliers be handled in the Spend dataset?

In the past, SafeGraph has suggested filtering out outliers beyond a certain min/max threshold, depending on the use case. This will help remove locations where coverage is too low to consider reliable.

One thing to consider is the fluctuation in panel size, so it’s also worth normalizing the data against the number of transactions and customers.

Here is a notebook that was previously published on sampling bias which might be helpful to review.

What does the customer_home_city refer to in the Spend dataset? If I want to do some spatial analyses on consumption behaviors, where can I get the boundaries of these areas? Or is there anywhere I could find spend pattern data with customers’ home location at the census tract/census block level?

The attribute definition is “The number of customers to the POI based on the customer’s estimated home location. Homes are indicated by unique city and state pairs.”

  • Each customer in the panel is classified into an income class using a proprietary model based on his or her transactions and spending data.
  • Similarly, each customer’s home city and state are estimated using a proprietary model based on where the user makes the majority of their transactions.
  • Note that SafeGraph does not provide any individual-level data in this dataset, and these models are used solely for aggregating demographic information about customers to points of interest. A reminder that both of these columns are subject to differential privacy, implemented specifically to remove the possibility of identifying individuals with this data.

Is the source of the Spend dataset panel stable or are more sources added/removed as time goes on? For example, the number of transactions in a specific POI doubles in 2020 compared to 2019.

The panel will always slightly change over time, which is why they post a Spend Transaction Panel that you can use to normalize the data.

Do the same consumers in the Spend data appear each month, or are consumers added and dropped each month?

Access to any individual-level data to identify whether an individual user is added, dropped, or already existing each month is not available, but intuitively we would strongly suspect the vast majority of the panel remains the same each month. There will certainly be consumers added and dropped each month based on how the Spend data is built, so the panel may change over time.

I noticed that SafeGraph says "it may be necessary to adjust Spend data based on the state sampling rate each month, in order to account for the variations in state sampling rate over time for some states.” Do you have any suggestions on how to do that?

Generally, the recommended approach is to use the Panel Overview Data and Census population data to normalize the raw spend totals for each state each month.