Discussions
Broken File in Veraset Visit Dataset
22 hours ago by Albert Cao
There seems to be broken parquet partition file for downloading veraset visit data.
api_key = "akv1_xVXxxxxxxxxxxxxxL"
data_id = "prj_6rg3whup__fldr_8zme9bwbekydvezq" # Veraset Visits dataset
# data_id = "fldr_d7cqgtcj3nyi4usp" # Veraset home visits dataset
# data_id = "fldr_gfv4qahxiwsd4dwy" # Veraset work visits dataset
start_date = "2024-01-01"
end_date = "2025-04-30"
output_path = "/data_jbod/personal/albert/dewey/NYTX2025/visits.parquet"
set_api_key(api_key)
urls = get_dataset_files(
data_id,
partition_key_after=start_date,
partition_key_before=end_date,
to_list=True
)
con = duckdb.connect()
con.execute("INSTALL httpfs; LOAD httpfs;")
con.execute("""
COPY (
SELECT *
FROM read_parquet($url)
WHERE lower(state) IN ('new york', 'texas')
AND lower(city) IN ('dallas', 'new york city')
)
TO $output_pathq
(FORMAT PARQUET, COMPRESSION ZSTD);
""", {"url": urls, "output_path": output_path})
print(f"✅ Downloaded filtered dataset to {output_path}")
---------------------------------------------------------------------------
HTTPException Traceback (most recent call last)
Cell In[2], line 19
17 con = duckdb.connect()
18 con.execute("INSTALL httpfs; LOAD httpfs;")
---> 19 con.execute("""
20 COPY (
21 SELECT *
22 FROM read_parquet($url)
23 WHERE lower(state) IN ('new york', 'texas')
24 AND lower(city) IN ('dallas', 'new york city')
25 )
26 TO $output_path
27 (FORMAT PARQUET, COMPRESSION ZSTD);
28 """, {"url": urls, "output_path": output_path})
29 print(f"✅ Downloaded filtered dataset to {output_path}")
HTTPException: HTTP Error: HTTP GET error on 'https://api.deweydata.io/api/v2/downloads/019b5981-c2b5-7782-9a23-88b188b59122.snappy.parquet?secret=Cesl5k52zEgytTR' (HTTP 500)