Job Records
Overview
The LinkUp job listing dataset, sourced through a proprietary process directly from employer websites for accuracy and timeliness, includes enhanced analysis built on dozens of data points per listing and includes the following components: full historical and ongoing scraped job record data, full-text job descriptions, structured fields, company-level reference data, and core analytics for clear and confident insights
Data Information | Value |
---|---|
Refresh Cadence | Monthly |
Historical Coverage | 2007 - Present |
Geographic Coverage | Global |
Key Concepts
Job Records
- Job records contains most of the information scraped from a career portal.
- Location data is presented as found in the job listing.
- The
job_hash
is an MD5 hash of the URL and serves as the unique identifier for the table. Each job posting at a particular location has its own unique job hash, while thebase_hash
is used to manage job postings in multiple locations. All job hashes for the same job across different locations are mapped to the samebase_hash
. - The
company_id
is the unique identifier for a company scrape, used to join reference files with job records. - The
created
field indicates when the job was first observed and scraped, whiledelete_date
reflects the most recent time the site was scraped and the job posting was not found.
Full Time / Part Time
- The full time parttime table is a point-in-time table used in the LinkUp dataset to identify if a job has language in its posting that indicates if the job is full-time, part-time or both.
- The full-time/part-time logic uses a robust keyword analysis which is applied to a job’s title, description and structured fields for jobs written in English to determine how to categorize the job.
Job Descriptions
- Descriptions is the full text descriptions we scrape from company career sites. This can be joined to job records using the
job_hash
. This is a great to use to parse out key skills or technologies being sought out.
ONET Taxonomy
- ONET is the primary source for standardized occupation information in the US for over 1,000 occupations covering the entire US Economy. The ONet 2019 Taxonomy file offers the latest ONet code for each job record. This code is an NLP solution that uses job records and descriptions to generate normalized titles.
Remote Tag
- The remote tag is a point-in-time table used to find all remote and non-remote work in the LinkUp dataset. Hybrid roles are considered remote work.
- Remote tag is determined using a robust keyword analysis which is applied to job records written in English.
- LinkUp is working on providing an additional data element to distinguish between remote and hybrid positions.
Structured Fields
- A structured field is a data point found on a job portal or job record and is defined by the employer consistently across their job listings. The consistency of structured fields allows our web scrapes to capture additional data points not found in the job title or description.
- The structured fields are an expansion of job listing data which provides un-changed structured field data. The data is not normalized except for
posted_date
andclosed_date
. - It's important to note that not all structured fields will be available on all employer portals because the structured fields provided are at the discretion of the employer.
Updated about 3 hours ago