Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate open dataset #69

Open
wants to merge 2 commits into
base: dev
Choose a base branch
from
Open

Generate open dataset #69

wants to merge 2 commits into from

Conversation

crispy-wonton
Copy link
Collaborator

@crispy-wonton crispy-wonton commented Sep 25, 2024

Fixes #67


Description

Add new script to generate open dataset with summary data per LSOA.

Prepare dataset for public sharing which includes summary data for each feature per LSOA.
Features to add:

median garden size estimate
property density
rural/urban status
proportion in building conservation area
proportion listed buildings
proportion with EPC C+
proportion of flats
proportion off gas
Also required: scores per tech type, n_properties, and use_weights/scores_weighted flag.

Instructions for Reviewer

In order to test the code in this PR you need to run the following command:
python -i asf_heat_pump_suitability/pipeline/run_scripts/run_generate_open_dataset.py --property_suitability s3://asf-heat-pump-suitability/outputs/2023Q4/20240905_2023_Q4_heat_pump_suitability_per_property.parquet --lsoa_suitability s3://asf-heat-pump-suitability/outputs/2023Q4/20240905_2023_Q4_heat_pump_suitability_per_lsoa.parquet -y 2023 -q 4

Please pay special attention to:

  • The weighting of the proportional features and property density to check that it has been applied correctly. The code should calculate weighted proportions for proportional features only when weights have been used in the scoring. Otherwise, it should calculate unweighted proportions (i.e. they should use a weight of 1 per property). Note that for the median garden size estimate, I calculated the normal median rather than weighted median for simplicity.
  • Check all key columns are included that we want to share.

Checklist:

  • I have refactored my code out from notebooks/
  • I have checked the code runs
  • I have tested the code
  • I have run pre-commit and addressed any issues not automatically fixed
  • I have merged any new changes from dev
  • I have documented the code
    • Major functions have docstrings
    • Appropriate information has been added to READMEs
  • I have explained this PR above
  • I have requested a code review

@crispy-wonton crispy-wonton marked this pull request as ready for review September 25, 2024 14:04
Copy link
Collaborator

@lizgzil lizgzil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @crispy-wonton this looks great to me! ✨
The script ran and the output looks as it should.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Generate underlying data per LSOA for public sharing
2 participants