Discussions

Ask a Question
Back to All

Monthly Patterns - Foot Traffic CBG column reliability

Specifically, I have questions about the following columns: visitor_home_cbgs and visitor_daytime_cbgs. I would like to clarify the definitions of the "home city" and "home CBG" fields, as well as understand how these fields are censored.

I've noticed that in many cases, the counts for each CBG are either set to "4" or are repeated, which raises some questions about how the visit counts are being censored or if values are reliable. I would appreciate any additional details on the censoring or noise mechanism used for these fields.


Here are the specific issues we’ve observed:

Repeated Visitor Counts:

We have noticed instances of repeated counts in visitor home CBGs. For example, in one case (t=92, San Ho Won), there are four different CBGs, each with exactly 37 visits and no other CBGs with counts higher than 1. It seems unlikely that such evenly distributed counts are accurate. This raises concerns about potential double-counting or uncertainty in assigning visits to specific CBGs. Could you clarify if such patterns reflect a data processing issue or methodological approach?

Sparsity and Truncation:

Many restaurant-month pairs lack visitor CBG data entirely, while others show data for only a handful of CBGs. Some states like Newyork have more coverage and some states like California has very few CBGs for each location. Additionally, many observations are censored at 2-4 visits recorded as 4. This sparsity and truncation make the data challenging to interpret and limit its utility for geographic analysis. Could you provide more information on the underlying data panel's coverage, the handling of low-frequency CBGs, and whether CBGs with only 1 visit are excluded?