Company Insights

Overview

Complete, worldwide company database of over 30 million businesses. Includes basic company information such as name, location, category, and industry, along with a number of aggregated trends such as employee count and average tenure.

Data Information	Value
Refresh Cadence	`Monthly`
Historical Coverage	`2010` - `Present`
Geographic Coverage	`Global`

Schema

People and Resume Data is a Multi-Table dataset which allows you to access just the data you need. PDL.COMPANY.COMPANY is the primary table. You can join additional tables using the COMPANY_ID identifier. The schema below is for the PDL.COMPANY.COMPANY table. For more information on how to access Multi-Table datasets via API, review our docs page.

Name	Description
LATEST_FUNDING_STAGE	The stage of the company’s most recent funding event
MIC_EXCHANGE	The MIC code for the company's ticker exchange
ULTIMATE_PARENT	The ID of the ultimate owning company
DATASET_VERSION	DATASET_VERSION
SIZE	Number of employees at the company (range)
LINKEDIN_ID	Main LinkedIn profile ID for the company
FOUNDED	The founding year of the company
TICKER	The company ticker for public companies
INDUSTRY	The self-reported industry of the company
COUNT_LINKEDIN_FOLLOWER	Count of LinkedIn followers
LINKEDIN_SLUG	LinkedIn slug for the company
AVERAGE_EMPLOYEE_TENURE	Average years of employee tenure
COUNT_FUNDING_ROUNDS	Number of funding rounds announced
GICS_SECTOR	GICS sector classification for public companies
COMPANY_ID	Unique identifier for the company
ULTIMATE_PARENT_TICKER	Ultimate parent's stock symbol (if public)
LINKEDIN_URL	Main LinkedIn profile URL for the company
NAME	Company's main common name
DISPLAY_NAME	Company's displayed name
TOTAL_FUNDING_RAISED	Total amount of funding raised in USD
EMPLOYEE_COUNT	Current number of employees
LAST_FUNDING_DATE	Date of the most recent funding event
TYPE	Company type
FACEBOOK_URL	Main Facebook profile URL for the company
WEBSITE	Primary company website
IMMEDIATE_PARENT	Direct owner of the company
ULTIMATE_PARENT_MIC_EXCHANGE	MIC exchange of the ultimate parent (if public)
TWITTER_URL	Main Twitter profile URL for the company
SUMMARY	Company description
INFERRED_REVENUE	Estimated annual revenue in USD
HEADLINE	Company's headline summary

Additional fields, such as the new one's below, are accessible by tables of the same name:

Name	Description
`top_next_employers`	The top ten companies employees moved to, and how many employees moved there, across all time periods
`top_previous_employers`	The top ten previous companies employees worked for previously, and how many current employees were previously employed by them, across all time periods
`top_next_employers_12_month`	The top ten next employers, counting only employee changes within the last 12 months
`top_previous_employers_12_month`	The top ten previous employers, counting only employee changes within the last 12 months
`employee_count_by_sub_role`	The number of current employees broken down by Job Title Sub Role.
`employee_growth_rate_12_month_by_sub_role`	The twelve month rate of change by Job Title Sub Role
`employee_count_by_class`	The number of current employees broken down by Job Title Class
`employee_growth_rate_12_month_by_class`	The twelve month rate of change by Job Title Class

Key Concepts

Reading in PySpark

People Data Labs Company data has a slightly tricky encoding with multiple lines and non-escaped quotes inside quotes that really trip PySpark up (Pandas handles it fine).

You can use this way of reading the data:

df = spark.read.option("header", "true")\
.option("multiLine", "true")
.option("escape", """)
.option("quote", '"')
.csv(path-do-company-data)