FAQs - People Data Labs

What is People Data Labs and what does it offer?

People Data Labs (PDL) is a B2B data provider that supplies rich, structured datasets about individuals and companies. Initially launched as a recruiting platform, it has grown into a powerful data infrastructure provider with coverage of over 3 billion person profiles and 60 million companies. The platform supports use cases across HR, marketing, investment research, and academic research.

What types of data does PDL provide?

PDL offers three primary datasets:

  • People Data: Identity graph with 800M+ resumés, updated monthly. Includes employment and education history, job titles, standardized companies and schools, roles/levels, and some inferred skills and compensation.
  • Company Data: Core firmographic data for 60M+ global companies, including industry, funding, size, and headcount.
  • Company Insights: Aggregated person-level resume data at the company level. Includes historical headcount, job flows, role and location breakdowns, and tenure statistics. Available in Dewey with a standard subscription.

What makes PDL data suitable for academic research?

  • Time Series: Company insights available monthly since 2010.
  • Granular: Covers detailed workforce transitions, job-to-job flows, tenure, and role-level breakdowns.
  • Scalable and Linkable: Cleaned and standardized entity fields make it easy to link with other datasets like Crunchbase, PitchBook, or Orbis.
  • Accessible: Unlike matched employer-employee data from governments, PDL is readily usable with standard licenses.
  • Extensible: Supports joinability with social media and GitHub handles, enabling new productivity studies.

How is the data sourced?

PDL uses two main sources:

  • Data Union: A data co-op of customers who provide data under contract with legal warrant of compliant acquisition.
  • Public Web: Crawled open web content is used for data enhancement and standardization.

PDL validates sources for quality and legality and tracks data lineage through the pipeline.

What is the geographic and demographic coverage?

  • Strongest in white-collar, English-speaking markets (e.g., North America, Western Europe).
  • Weaker in blue-collar and non-Western sectors due to lower online presence.
  • Research users are advised to combine PDL with census-based population data to adjust for bias.

How frequently is the data updated?

  • People Data: Monthly updates to resumes and identities.
  • Company Insights: Recalculated monthly based on new resume entries and changes.

Researchers should review the ADUP before designing projects, especially those involving reidentification or sensitive analysis.

What limitations or caveats should researchers keep in mind?

  • Historical Jobs: Onet codes and some standardizations apply only to current jobs.
  • Inferred Salaries: Currently too coarse for high-stakes analysis; better suited for directional insights.
  • Biases: Data reflects online visibility, which may underrepresent certain workforce segments.
  • Job Flows: Aggregated flows in Company Insights are lifetime, not time-bound.

How does PDL compare to other providers like Revelio or L2?

  • Revelio Labs: Similar in scope, but uses language-based clustering for job titles versus PDL's deterministic method. Coverage and freshness vary.
  • L2: Focuses on voter and consumer data. Not a labor market data provider.

Can the data be joined with external sources?

Yes. PDL offers rich join keys:

  • Cleaned and raw company names
  • Domains and URLs (including Crunchbase IDs)
  • Tickers and industry codes

Probabilistic and deterministic matching supported. Enrichment API is available for complex cases.

What’s included in the Dewey subscription?

  • Company Insights data is included with a Research Team subscription.
  • People Data (resumes, flows, skills, etc.) requires an institutional subscription.

Are there tools, schema, or community resources available?