Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature: Add serialization of reference dataset #5427

Merged
merged 10 commits into from
Feb 14, 2023

Conversation

svotaw
Copy link
Contributor

@svotaw svotaw commented Aug 16, 2022

Summary

This is in reference to feature request: #5426

This PR adds APIs for serializing/deserializing Datasets without their data to a byte array, effectively creating a "schema" or "reference" that can be used to create other Datasets.

Implementation

The existing code for serializing Datasets to file was refactored to be able to go to any generic BinaryWriter, whether memory or file. The verbose serialization code was shared as much as possible, splitting methods into Header vs Data components.

Also, a generic ByteBuffer was created so that higher languages (e.g. Java) are removed from managing the byte memory of the serialized buffer.

Test

New C++ tests were created to test both the serialization/deserialization and the new ByteBuffer functionality.

@jameslamb
Copy link
Collaborator

jameslamb commented Aug 20, 2022

/gha run r-valgrind

Workflow R valgrind tests has been triggered! 🚀
https://github.com/microsoft/LightGBM/actions/runs/2893166788

Status: success ✔️.

Copy link
Collaborator

@StrikerRUS StrikerRUS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some typos and one suggestion for the version constant.

include/LightGBM/c_api.h Outdated Show resolved Hide resolved
include/LightGBM/c_api.h Outdated Show resolved Hide resolved
include/LightGBM/c_api.h Outdated Show resolved Hide resolved
include/LightGBM/dataset.h Outdated Show resolved Hide resolved
src/io/dataset.cpp Outdated Show resolved Hide resolved
tests/cpp_tests/test_serialize.cpp Outdated Show resolved Hide resolved
tests/cpp_tests/test_serialize.cpp Outdated Show resolved Hide resolved
tests/cpp_tests/test_serialize.cpp Outdated Show resolved Hide resolved
tests/cpp_tests/test_serialize.cpp Outdated Show resolved Hide resolved
tests/cpp_tests/test_serialize.cpp Show resolved Hide resolved
Copy link
Collaborator

@StrikerRUS StrikerRUS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I don't have any additional comments for this PR.

✔️

@guolinke
Copy link
Collaborator

@shiyu1994 can you help to reivew?

@svotaw
Copy link
Contributor Author

svotaw commented Oct 22, 2022

@shiyu1994 can you take a look? ty

@svotaw
Copy link
Contributor Author

svotaw commented Nov 8, 2022

@shiyu1994 Just checking in

@svotaw svotaw requested review from StrikerRUS and removed request for guolinke and shiyu1994 November 29, 2022 21:29
@jameslamb
Copy link
Collaborator

jameslamb commented Jan 3, 2023

/gha run r-valgrind

Workflow R valgrind tests has been triggered! 🚀
https://github.com/microsoft/LightGBM/actions/runs/3826303140

Status: success ✔️.

@jameslamb
Copy link
Collaborator

@shiyu1994 @guolinke can you help with a review on this?

@shiyu1994
Copy link
Collaborator

Sorry for the late response. Will review it within the next two days.

Copy link
Collaborator

@shiyu1994 shiyu1994 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution. I start to review this but it will take some time. Hopefully I can finish this by tomorrow.

include/LightGBM/dataset.h Outdated Show resolved Hide resolved
include/LightGBM/dataset.h Outdated Show resolved Hide resolved
include/LightGBM/feature_group.h Outdated Show resolved Hide resolved
Copy link
Collaborator

@shiyu1994 shiyu1994 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@svotaw Just left a few suggestions, could you please check? Thanks. Overall the PR LGTM.

src/io/dataset.cpp Outdated Show resolved Hide resolved
@svotaw svotaw requested review from shiyu1994 and removed request for StrikerRUS, guolinke and jameslamb February 14, 2023 01:12
@svotaw
Copy link
Contributor Author

svotaw commented Feb 14, 2023

@shiyu1994 I made the requested changes. Can you look it over and try rerunning the failures? they don't seem related to this PR

@jameslamb
Copy link
Collaborator

jameslamb commented Feb 14, 2023

/gha run r-valgrind

Workflow R valgrind tests has been triggered! 🚀
https://github.com/microsoft/LightGBM/actions/runs/4169689163

Status: success ✔️.

@shiyu1994 shiyu1994 merged commit 0f7983b into microsoft:master Feb 14, 2023
@github-actions
Copy link

This pull request has been automatically locked since there has not been any recent activity since it was closed.
To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues
including a reference to this.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 15, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants