-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feature: Add serialization of reference dataset #5427
Conversation
/gha run r-valgrind Workflow R valgrind tests has been triggered! 🚀 Status: success ✔️. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some typos and one suggestion for the version constant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I don't have any additional comments for this PR.
✔️
@shiyu1994 can you help to reivew? |
@shiyu1994 can you take a look? ty |
@shiyu1994 Just checking in |
/gha run r-valgrind Workflow R valgrind tests has been triggered! 🚀 Status: success ✔️. |
@shiyu1994 @guolinke can you help with a review on this? |
Sorry for the late response. Will review it within the next two days. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the contribution. I start to review this but it will take some time. Hopefully I can finish this by tomorrow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@svotaw Just left a few suggestions, could you please check? Thanks. Overall the PR LGTM.
@shiyu1994 I made the requested changes. Can you look it over and try rerunning the failures? they don't seem related to this PR |
/gha run r-valgrind Workflow R valgrind tests has been triggered! 🚀 Status: success ✔️. |
This pull request has been automatically locked since there has not been any recent activity since it was closed. |
Summary
This is in reference to feature request: #5426
This PR adds APIs for serializing/deserializing Datasets without their data to a byte array, effectively creating a "schema" or "reference" that can be used to create other Datasets.
Implementation
The existing code for serializing Datasets to file was refactored to be able to go to any generic
BinaryWriter
, whether memory or file. The verbose serialization code was shared as much as possible, splitting methods into Header vs Data components.Also, a generic
ByteBuffer
was created so that higher languages (e.g. Java) are removed from managing the byte memory of the serialized buffer.Test
New C++ tests were created to test both the serialization/deserialization and the new
ByteBuffer
functionality.