Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[python-package] Add tests for passing Arrow arrays with empty chunks #6210

Merged
merged 6 commits into from
Dec 10, 2023
Merged

[python-package] Add tests for passing Arrow arrays with empty chunks #6210

merged 6 commits into from
Dec 10, 2023

Conversation

borchero
Copy link
Collaborator

Motivation

Follow-up from #6166. Thanks a lot for proposing this test @jameslamb, it actually caught an error in the C++ code! I'm now ignoring empty chunks as keeping them around would make life hard (.begin() couldn't be O(1) anymore which violates assumptions about iterators).

Turns out we just got lucky in #6166 as the chunked array being initialized is contiguous in memory...

Copy link
Collaborator

@jameslamb jameslamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey great, thanks for looking into this!

I agree with skipping them. I can't think of a reason LightGBM should need to care about preserving the empty chunks on its side... it just wants to get from raw input data (chunked array) to the format for training (LightGBM Dataset) correctly and efficiently.

@jameslamb jameslamb merged commit 522f0f0 into microsoft:master Dec 10, 2023
41 checks passed
@borchero borchero deleted the empty-chunk-tests branch December 11, 2023 02:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants