Methodolgy Difference from SafeGraph Patterns
Outline of any difference between Advan Research's mobility datasets and legacy SafeGraph Patterns datasets
In decreasing order of contribution to the differences:
No Modeling of Visits/Visitors in Shared Geometries
Advan computes the visits/visitors and other metrics inside a POI using the POI’s geometry.
If a POI has shared geometry, SafeGraph assigns a subset of the above visits to each POI in the Shared Polygon. Additional historical visitation techniques are listed here.
Example: a Pizza Hut and a Taco Bell may have a shared geometry (they both operate at the same location, e.g., inside a mall). If this polygon had 1,000 visits, SafeGraph will “assign” each of those 1,000 visits to one of either Pizza Hut or Taco Bell. Advan will report 1,000 visits to each of the two.
Effect: The majority of POIs with Shared Polygons will have a LOT more traffic, and therefore the total visits and visitors across the sum of all POIs will be a lot higher, on the order of 10x (however the median visits will only be 25% higher, and for a different reason, see below). The important thing to understand, however, is that the historical trends (year/year changes etc) will be more consistent than before, as there is no scaling that adds a layer of uncertainty / fluctuation. Past analysis conducted at SafeGraph has found that changes in visitation/spending at one POI is highly correlated with changes in visitation/spending at neighboring POI. For this reason, the most accurate way to study POI visitation over time is to attribute visits at the polygon level rather than trying to predict and assign visits.
Recommended actions: if you are measuring year/year changes please filter out Shared Polygons from your computations. Advan provides a list of Placekeys pertaining to Shared Polygons that need to be filtered out.
Further context: analysis by Advan indicates that counting traffic in Shared Polygons results in over-stating traffic changes for several NAICS codes, in particular causing a drop on the order of 25% in estimated traffic in restaurants, bars, clothing shoe jewelry and cosmetic stores in November and December 2022. This appears to be caused by a lot of these locations sharing polygons with Parks, which experienced a traffic drop, that is in turn magnified for each shared polygon. Advan is actively working on a long-term fix and is running further analyses to understand mitigation techniques and further causes. Please do not hesitate to reach out with any further questions.
No estimation of Visits/Stops
Advan computes visits by measuring the GPS pings inside a POI’s polygon. It does not apply any dwell time or any concept of “stops”; it relies on the polygon for accuracy. Advan has tested its own data on 1,500 publicly traded tickers versus (a) top line revenue as reported from the companies and (b) credit card transaction counts on physical locations, and has determined consistently that in the vast majority of cases filtering for dwell time reduces the signal and makes the correlation/forecasting worse.
SafeGraph first computes “stops”; then it compares the stops to the POIs within a 90 meter radius; then assigns a device to one of those POIs using an algorithm that takes into account hour of day, day of week, distance from the POI centroid, etc. There are a lot of inherent assumptions in this methodology that can result in additional noise, even if the methodology was created by verifying vs ground truth data because:
- It is easy to overfit a methodology to an existing ground truth set
- What works for some POIs in some area at some time period for some mobility data will not necessarily work for all POIs at all times with all mobility data
- When you add one POI in an area it may change the visits for an unrelated POI in the vicinity, so as POIs change they could be adding noise
Effect: Advan’s visitation counts are a median of 25% higher (I.e. the typical location has 25% more devices observed in it). Additionally, as long as a POI’s polygon remains consistent, visit counts over time will be significantly more stable and there is less risk of visit cannibalization from neighboring POI.
Panel consistency
Advan did not experience the panel changes or mobility data provider disruption that SafeGraph did in May.
Effect: Advan’s visitation counts did not have large swings in 2022 and will be more consistent on a year over year basis.
Panel utilized
Advan will utilize devices:
- Observed “in the background” i.e., where the user has given permission to an application to collect the data even when they are not using the application / have it open
- From applications whose number of devices do not have “Steps”, i.e., huge and sudden drops or increases when generating visits.
SafeGraph adds some foreground devices in the above panel. This results in a larger panel, but without any material differences in the number of observations. It also has the potential of adding noise as the foreground devices are not observed as consistently.
Effect: Advan’s visitation counts will be more consistent on a year over year basis. Advan is also less likely to encounter bugs like the normalized_visits_by_state_scaling
bug that was reported by SafeGraph in August 2022, which occurred from a lower quality panel contributing a significant increase in home visits but no meaningful increase in POI visits.
Number of CBGs and Census Tracts in Trade Areas
Advan cuts the number of CBGs in a trade area to the top 1,000 and number of tracts in a trade area to the top 400. SafeGraph does not.
This is a temporary measure to limit the size of the data and make it easier to ingest. Advan generates home/work trade areas as 4 fields - geohash 6 (i.e., g6), g5, g4, and g3, so the more distant areas have lower granularity. Advan reserves the right to modify the schema in the future to similarly reduce the overall data size without losing granularity at the local level.
Effect: the visitor columns (visitor_home_cbgs
, visitor_home_aggregation
, etc.) will contain a much smaller number of CBGs / Census Tracts. Additionally, CBGs / Tracts that are distant from the POI (and less likely to have significant visitation to the POI) will be missing.
Normalization
Advan calculates the column normalized_visits_by_state_scaling
by dividing the US adult population by the sum of unique visitors seen daily, multiplying that scale by the daily raw_visitor_count
per POI and then summing this value over the respective time period. This data has been tested against “ground truth” and has proven to be robust in capturing true visit trends. Advan will be computing the remaining normalization columns using the methodology listed in SafeGraph’s documentation.
SafeGraph calculates the column normalized_visits_by_state_scaling
by dividing the regional population by the unique visitors seen in the region (US State or CA Province) in the respective time period and multiplying that scale by the raw_visit_counts
per POI.
Effect: Both Advan and SafeGraph’s method correct for differences in panel size and can be used to understand visits longitudinally.
Dwell computations
When measuring the median dwell time in a location, Advan filters out any devices that have no dwell time. This is very similar to what SafeGraph has been doing (filtering for devices that “stopped”, i.e., with at least 1 minute dwell), as the majority of devices either have dwell time more than 1 minutes or no dwell time at all – very few fall in the (0,1) bucket of dwell times.
Effect: Advan and SafeGraph’s method correct for dwell metrics are substantially the same and the data will note change.
However, for purposes of computing devices in each dwell bucket, Advan uses all the observations (whether the device dwelled in a location or not); this results in Sum(visitors on each dwell bucket) = all visitors. However, because the median dwell time is computed using only dwelled devices, Sum(visitors on each dwell bucket * median dwell time for the bucket) < all visitors * median dwell. This is by design, as the dwell buckets are measuring different things than the median dwell time.
Home computations
Advan computes a device’s home/work (night/day) location by computing the time a device spent in each building in the country; then taking the most frequented building.
SafeGraph computes a home location in a similar way Advan used to do it before tying each device to a specific building, per this documentation.
Effect: Home locations and trade areas will be more accurate.
Updated about 10 hours ago