Discussions

Ask a Question
Back to all

Veraset Data Format - Parquet Support?

Hi Jake,

Following up on our previous discussion about filtering Veraset data, I have a question about the file format.

I noticed that DuckDB works most efficiently with Parquet files, but Veraset data is currently only available in CSV format on Dewey Data. Would it be possible for Veraset data to be offered in Parquet format instead of (or in addition to) CSV?

Parquet would provide several advantages:

  • Much smaller file sizes due to columnar compression
  • Efficient column-based filtering through DuckDB
  • Faster query performance for large datasets
  • Better integration with the DuckDB workflow you recommended

Given that the full Visits dataset is around 12TB (national, whole time period) in CSV, converting to Parquet could significantly reduce storage requirements and make the filtering process more practical for many users.

Is this something that could be implemented on the Dewey Data platform?

Thanks for considering this!

Best,
David