Company Insights

Overview

The Company Insights dataset is a comprehensive, worldwide company database containing over 30 million businesses. The dataset includes basic company information such as name, location, category, and industry, along with aggregated trends including employee count and average tenure. Company Insights serves as the root company record to which other Company Insights reference tables join via the company_id field.

Data Description

The Company Insights dataset functions as a central company record with numerous associated reference tables that capture different dimensions of company information. Each reference table links back to the main company record through the company_id foreign key.

The dataset encompasses multiple categories of company information, including:

Employee Movement and Retention Metrics:

  • Top ten previous employers (12-month window), with company names as keys and integer counts of employee profiles as values
  • Top ten next employers (12-month window), showing where former employees moved
  • Previous and next employer breakdowns by job role, using Canonical Job Roles
  • Employee churn rates at 3, 6, 12, and 24 month windows, rounded to four decimal points

Geographic Distribution:

  • Top ten U.S. metropolitan areas where employees are located, with current headcount and 12-month growth metrics
  • Employee counts broken down by country, using canonical country names as keys

Executive Personnel Tracking:

  • Profiles of all CXOs, owners, and VPs joining in the last three months, including new title and start date
  • Executive departures in the last three months, including prior job title role and new company

Temporal Employee Data:

  • Monthly employee counts in YYYY-MM format
  • Monthly employee counts broken down by job title level (cxo, vp, director, manager, senior, entry)
  • Monthly employee counts broken down by canonical Job Role
  • Monthly gross additions (employees who joined)
  • Monthly gross departures (employees who left)

Employee Composition:

  • Current employees broken down by canonical Job Role
  • Growth rates for 3, 6, 12, and 24 month periods, expressed as percentage changes
  • 12-month growth rates by job role

Tenure Analysis:

  • Average years employees have spent at the company by job title level
  • Average years employees have spent at the company by canonical Job Role

Company Classification:

  • Disclosed funding stages (e.g., seed, series_a, series_b, ipo)
  • Company tags normalized to lowercase strings
  • North American Industry Classification System (NAICS) assignments with code, sector, sub_sector, industry_group, and naics_industry descriptors
  • Standard Industrial Classification (SIC) assignments with sic_code, industry_group, major_group, and industry_sector descriptors

Corporate Structure:

  • Affiliated company IDs (both parents and subsidiaries)
  • All subsidiaries including both direct and indirect relationships
  • Direct subsidiaries with one-to-one relationships

Company Identifiers:

  • Alternative names including DBAs, prior legal names, and common short forms
  • Alternative domains beyond the primary website
  • LinkedIn profile URL, along with linkedin_id and linkedin_slug
  • Headquarters location details including postal code, street address, locality, region, country, continent, and geo coordinates

Coverage

The dataset covers over 30 million businesses worldwide.

For temporal employee data, the month range begins at the start date of the first associated employee or January 1, 2010, whichever is most recent. The final month is the last full month before the most recent monthly Data Build.

Top previous and next employer data is restricted to the last 12 months.

Recent executive hire and departure data covers the last three months.

Methodology

Employee counts and related metrics are derived from PDL profile coverage.

For previous and next employer data by role, companies are listed using their PDL Company ID, and roles are based on the employee's role at the queried company using Canonical Job Roles. If no start date is given or no role exists, the experience is not counted. If there are fewer than ten next/previous employers for a role, the table returns as many as exist.

For monthly gross additions and departures, entries with no start or end date—or only a year—are not counted. This differs from employee_count_by_month, which assumes January when no month is given.

For company tags, there may be tags that seem to overlap (for example: "data", "analytics" and "data and analytics"), which is intentional to make it easier to search for companies matching a tag.

Additional Notes

Data Quality Considerations:
Numbers for gross additions and departures may diverge from actual headcount due to false positives, false negatives, missing or duplicate individuals, and missing start/end dates.

Legacy Fields:
The affiliated_profiles field is a legacy field superseded by the newer affiliated_entities field, which expresses the same relationships as structured objects (affiliated_id, display_name, relationship, employee_count) and is easier to display, ingest, and analyze. The affiliated_entities field is recommended for new work.

Similarly, the all_subsidiaries and direct_subsidiaries fields are legacy fields redundant with the newer affiliated_entities field, which captures the same parent/subsidiary relationships in a structured format.

Subsidiary Relationships:
A company can have multiple layers of subsidiaries. For example, Rimeto—acquired by Slack Technologies, itself acquired by Salesforce—appears in Salesforce's subsidiary list even though it is not a direct subsidiary.

Industry Classifications:
A company can (and frequently does) have multiple SIC codes.

Funding Stage:
The latest funding stage is also surfaced separately as latest_funding_stage.