Defining and Handling PII at Dewey

Learn how Dewey identifies, classifies, and manages personally identifiable information (PII) to ensure responsible, privacy-conscious data use while maximizing research value.

At Dewey, we are committed to enabling responsible, privacy-conscious data use. Dewey makes datasets available from a variety of providers to power research.

With Dewey, it's possible to download and join data from diverse sources, as such it’s essential to define how we identify and manage personally identifiable information (PII). Dewey’s approach is grounded in global best practices, regulatory frameworks, and ethical data management principles.

Our goal is to maximize the value and usability of data for research purposes while ensuring compliance and minimizing privacy risks.

Understanding PII

What is PII?

Personally Identifiable Information (PII) is any information that can be used to identify an individual, either directly or indirectly, alone or when combined with other data.

Dewey’s working definition aligns with established standards including:

  • NIST SP 800-122 – “any information that can be used to distinguish or trace an individual’s identity.”
  • GDPR Article 4(1) – “any information relating to an identified or identifiable natural person.”
  • Academic privacy research that categorizes PII into direct and quasi-identifiers.

Types of PII

CategoryDescriptionExamples
Direct IdentifiersExplicit identifiers that uniquely identify an individual.Name, email, phone number, government ID, LinkedIn URL
Indirect / Quasi-IdentifiersAttributes that can identify a person when combined with other data.ZIP code, date of birth, gender, employer, geolocation
Public Record DataInformation lawfully and publicly available, often tied to property or business records.Property addresses, company registrations, assessor parcel data

Our Approach to PII

Balancing Privacy and Data Utility

Dewey’s datasets often include information from public and/or commercial sources. Our philosophy is to retain, and make available, meaningful data that supports legitimate analytical and research uses while removing or obfuscating elements that could identify private individuals.

We view privacy protection not as a constraint, but as a foundation for ethical data use and research.

PII Classification Framework

Dewey applies a tiered classification model to guide decisions on what data may be included, transformed, or removed. Dewey’s PII handling tiers correspond directly to these types: Tier 1 aligns with direct identifiers, Tier 2 with quasi-identifiers, and Tier 3 with public record or non-PII data:

Tier 1: Direct PII

Always removed or hidden prior to dataset publication. Examples:

  • Full name
  • Personal email or phone number
  • Social media profile links (e.g., LinkedIn, Twitter)
  • Date of birth

Tier 2: Sensitive or Contextual PII - remove tier 2

Conditionally retained when:

  • The data is publicly available (e.g., business contact data, professional directories).
  • The inclusion of the attribute serves a research or compliance purpose.

Examples:

  • Business contact details
  • Professional identifiers (e.g., broker license number)
  • Various levels of geolocation data

Tier 3: Non-PII / Public Record Data

Considered non-sensitive and retained for analytical use. Examples:

  • Real estate property addresses from public assessor records
  • Census block group or demographic aggregates
  • Business registration information

Data Processing

Detection and Removal

Dewey conducts a manual review to ensure fields deemed Tier 1 are removed.

Provider Collaboration

We encourage all upstream data providers to follow Dewey’s privacy principles and disclose any data fields that may contain PII.