Pdi Technologies
Overview
PDI Technologies (formerly Skupos) provides SKU-level point-of-sale (POS) data from tens of thousands of independent convenience stores across the United States, covering roughly 80% of the independent convenience-store market. Through its acquisition of Skupos, PDI maintains the largest panel of POS data from independent convenience stores in the United States.
The PDI Convenience Store Transaction Dataset is composed of nine relational data files that share unique identifiers across stores, transactions, payments, items, discounts, and shoppers, enabling researchers to assemble a full picture of in-store consumer purchases at SKU granularity.
Standard join keys include: STORE_ID, TRANSACTION_SET_ID, TRANSACTION_ITEM_ID, PAYMENT_ID, and GTIN.
Data Description
The dataset consists of nine relational tables forming a star-with-relations schema:
Stores Information
One row per convenience store in the PDI panel with descriptive metadata: store name, chain affiliation, address (street, city, state, ZIP), latitude/longitude, chain size, and the start date for the store joining PDI's panel. Also tracks the start week beyond which the store's transaction data is continuous (no delivery gaps). This is the canonical "stores" dimension table that every transaction-level table joins to via STORE_ID.
Observation Level: One row per store in the PDI panel, keyed by STORE_ID.
Stores Status
Stores dimension extended with operational status flags. Adds STORE_FLAG and ACTIVE_STATUS columns indicating whether a store is currently active, on hold, or otherwise excluded from delivery. Otherwise mirrors the Stores Information schema. Use this table to filter out inactive stores when computing time-series aggregates.
Observation Level: One row per store, keyed by STORE_ID. Same grain as Stores Information.
Master GTIN
The canonical SKU dimension. One row per 14-digit GTIN with PDI-curated product attributes: category, subcategory, manufacturer parent, manufacturer, brand, product type, sub-product type, flavor, unit size, pack size, package, and a descriptive product name. The reference table for joining transaction-level item lines back to product-level metadata.
Observation Level: One row per unique GTIN (14-digit Global Trade Item Number).
Transaction Sets
Transaction header / basket-level table. One row per transaction at a store, capturing transaction timestamp, POS vendor type (Verifone, Gilbarco, Clover, NCR), payment type, subtotal, taxable amount, tax amount, tax rate, grand total, tender received, and tender given (change). Parent table for line-item, discount, and payment children which join via TRANSACTION_SET_ID.
Observation Level: One row per completed transaction (basket), keyed by TRANSACTION_SET_ID.
Transaction Items
Line-item-level transaction table. One row per item line within a transaction with GTIN, POS-assigned description, unit price, unit quantity, discount amount applied at the line, taxable amount, tax rate, grand total, and scan vs. non-scan capture method. Non-scan items carry NACS category / subcategory / detail labels.Joins to Master GTIN via GTIN for product metadata; joins to Transaction Sets via TRANSACTION_SET_ID for basket context; child of the transaction set.
Observation Level: One row per line item within a transaction, keyed by TRANSACTION_ITEM_ID.
Transaction Items Daily Aggregation
Pre-aggregated daily summary of transaction items. One row per (store, GTIN or non-scan category, calendar date) with quantity sold, transaction count, total revenue amount, quantity sold under a discount, and transaction count with a discount. Includes date dimensions (day of week, week, calendar month, calendar year) for easier time-series analysis without joining against the line-item table. Designed for analyses that don't need individual line-item resolution.
Observation Level: One row per unique combination of STORE_ID + GTIN (or non-scan category) + DATE, keyed by ID.
Payments
Payment-method records associated with transactions. One row per payment instrument applied to a transaction, capturing payment type (cash, debit, credit, etc.), payment entry mode (swipe, etc.), card type, merchant ID, terminal ID, payment locale, currency, tender amount, and acquirer reference / card auth codes when applicable. A single transaction may have multiple payment rows (split tender). Joins to Transaction Sets via TRANSACTION_SET_ID.
Observation Level: One row per payment instrument used on a transaction, keyed by PAYMENT_ID.
Discounts
Discount-line records applied to transaction items. One row per discount applied. Includes discount type code, discount name (parsed from t-log), discount number, quantity, adjustment amount, discount amount, and grand total. Discounts have a 1-to-many (1:N) relationship with transaction items — multiple discounts can be applied to the same line item. Joining Discounts to Transaction Items or Transaction Sets can fan out rows; aggregate discounts to the item-line grain first if total-recoverable-discount per line is the goal.
Observation Level: One row per discount applied to a transaction line item, keyed by DISCOUNT_ID.
Shopper ID
A privacy-preserving identifier table that attempts to link transactions to unique shoppers. One row per payment containing a SHOPPER_ID string assigned by PDI's identity-resolution methodology, plus the parent PAYMENT_ID and TRANSACTION_SET_ID. Enables longitudinal shopper-level analysis — repeat-visit rates, basket-size distributions per shopper, brand loyalty — without exposing personally identifiable information.
Observation Level: One row per payment with a resolved shopper, keyed by PAYMENT_ID. Not all payments have a resolved shopper; coverage depends on the underlying identity signal (e.g. card-on-file).
Coverage
Geography: United States; tens of thousands of independent convenience stores covering roughly 80% of the independent convenience-store market.
Time: Continuous SKU-level POS coverage from 2023 onward, with most-recent-load timestamps in the 2025-2026 window. New transactions ingested incrementally as POS t-logs are delivered.
Methodology
All PDI datasets are sourced from PDI Technologies' integrated POS data feed across its panel of tens of thousands of independent U.S. convenience stores, parsed from in-store POS transaction log (t-log) files across four supported vendors (Verifone, Gilbarco, Clover, NCR). Dimension tables (Stores Information, Stores Status, Master GTIN) are curated by PDI through panel management and product-hierarchy sources with NACS category mappings; transaction-level tables (Transaction Sets, Transaction Items, Payments, Discounts, Shopper ID) capture every line, payment, and discount as recorded by the POS. Transaction Items Daily Aggregation id derived nightly from the line-item table for faster time0series analysis without scanning individual transactions.
Additional Notes
Relational Structure: The nine tables form a star-with-relations schema. Standard join paths include:
- Stores Information / Stores Status ← STORE_ID → Transaction Sets, Transaction Items, Payments, Discounts, Transaction Items Daily Aggregation
- Transaction Sets ← TRANSACTION_SET_ID → Transaction Items, Payments, Discounts, Shopper ID
- Transaction Items ← TRANSACTION_ITEM_ID → Discounts (1:N)
- Payments ← PAYMENT_ID → Shopper ID (1:1)
- Master GTIN ← GTIN → Transaction Items, Transaction Items Daily Aggregation
Updated 18 days ago