Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

smif prepare-convert will ignore combined datasets #389

Open
tomalrussell opened this issue Jul 31, 2019 · 0 comments
Open

smif prepare-convert will ignore combined datasets #389

tomalrussell opened this issue Jul 31, 2019 · 0 comments
Labels

Comments

@tomalrussell
Copy link
Member

An unintended feature of our method for reading data arrays from CSVs is that multiple data variables can be stored in extra columns in a single file.

E.g. population and GVA might share a region dimension and be defined over the same timesteps, so a CSV with timestep,region,pop,gva as a header could be read to load a pop data array or a gva data array.

The smif prepare-convert command reads all data arrays associated with a model run and writes them to parquet, one by one. When a CSV file contains more than one data array, the corresponding parquet file will be written twice or more, and will only contain the last data array to be read and re-written.

Approaches:

  • maintain the unintended feature, allow in parquet too - convert would need to be aware of all files with multiple data arrays, and to do some recombination before writing
  • avoid the unintended feature - would need to clean all data in any smif user's projects to separate out datasets
  • smif csv2parquet is a simpler and less flexible workaround (see f951de5) that sets up a useable binary data store from csv. Sticking with this for now
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant