Products

Overview

Veridion provides a high-fidelity, representative sample of a 134 million company universe that covers the full spectrum of existing public and private entities. The Products dataset provides granular, product-level intelligence extracted from the web presence of companies in the Veridion universe, enabling research on product-level market structure, supply chain composition, and the intersection of product attributes with firm characteristics.

Data Description

The Veridion Products dataset provides product-level intelligence extracted from the web presence of companies in the Veridion universe. This sample covers the largest companies included in the Firmographics dataset and includes rich structured attributes: UNSPSC classification, materials, ingredients, certifications, pricing, size, energy efficiency, and more. Each product record is traced to its source URL and includes both the raw product title and a Veridion LLM-normalized product name stripped of marketing noise. The UNSPSC taxonomy (segment, family, class) provides a standardized classification enabling cross-company and cross-industry comparability.

Observation Level: Veridion ID (unique ID per company) + Product Name → Product-level records linked to a parent company via Veridion ID and website domain. One company may have multiple product rows.

Historical Coverage: Datasets provide cross-sectional snapshots rather than longitudinal panels, as the collection methodology prioritizes accuracy of current state over historical preservation. Researchers should treat each release as an independent cross-section of the global firm population.

Coverage

The Products dataset covers the largest companies included in the Firmographics dataset. Coverage is anchored to companies with active web domains that contain identifiable product pages and catalogs.

Methodology

Product records are extracted from company websites by Veridion's proprietary crawling and extraction pipeline. The system identifies product pages, catalogs, and listings across the indexed web, then applies specialized large language models to extract structured attributes including product names, descriptions, materials, specifications, pricing, certifications, and UNSPSC classifications. Product names are normalized by Veridion LLMs to remove marketing language, brand names, and technical noise while preserving the core product identity. The UNSPSC classification is generated at three levels of granularity (segment, family, class) to enable standardized cross-company comparison. Each record retains a link to its source URL for provenance verification.