Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[R-package] Rename slice() to lgb.slice.Dataset() #6293

Merged
merged 3 commits into from
Feb 3, 2024

Conversation

david-cortes
Copy link
Contributor

From similar PR in xgboost: dmlc/xgboost#10017

LightGBM registers a generic S3 slice method, and defines it for its class lgb.Dataset.

This is problematic in R, since a widely used library, {dplyr}, also defines a generic slice method but with a different signature, with implementations for data frames and variants, and extensions like {dbplyr} and similar define it for more object classes based on dplyr's signature.

This PR renames it to lgb.slice.Dataset, in order to avoid conflicts with other commonly used R frameworks.

@jameslamb jameslamb changed the title [R-package] Rename slice method to lgb.slice.Dataset [R-package] Rename slice() to lgb.slice.Dataset() Feb 2, 2024
Copy link
Collaborator

@jameslamb jameslamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this!
I'm ok with making this breaking change, given that:

  • it's already been merged over in XGBoost (that PR you linked)
  • {dplyr} is probably much more widely used than {lightgbm}
  • the use of :: in some R isn't as common as I wish it was 😞
  • as far as I can tell from a quick code search, {lightgbm}'s reverse dependencies on CRAN don't use slice()

For others wondering what specifically "problematic" in this PR description refers to, it's this:

library(lightgbm)
library(dplyr)

# Attaching package: ‘dplyr’
# The following object is masked from ‘package:lightgbm’:
#
#    slice

dtrain <- lgb.Dataset(
    data = matrix(rnorm(1e5), nrow = 1000)
)
dtrain$construct()

slice(dtrain, 1:5)
# Error in UseMethod("slice") :
#  no applicable method for 'slice' applied to an object of class "c('lgb.Dataset', 'R6')"

If slice() is called on a Dataset in a session where {dplyr} was attached after {lightgbm}, the {lightgbm} method isn't found and a runtime error occurs. This can be avoided by such code specifying lightgbm::slice().

@jameslamb
Copy link
Collaborator

This error in CI is unrelated to these changes:

Run r-lib/actions/setup-pandoc@v2
  with:
    pandoc-version: [2](https://github.com/microsoft/LightGBM/actions/runs/7745434225/job/21139413173?pr=6293#step:7:2).19.2
  env:
    _R_CHECK_SYSTEM_CLOCK_: 0
    _R_CHECK_CRAN_INCOMING_REMOTE_: 0
    _R_CHECK_PKG_SIZES_THRESHOLD_: 100
/__e/node20/bin/node: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.28' not found (required by /__e/node20/bin/node)

From the combination of actions/checkout#1474 and https://github.com/r-lib/actions/releases, I suspect the root cause is r-lib/actions/setup-pandoc updating to a newer version of node. I'll work around that on a separate PR.

@jameslamb jameslamb merged commit 7435cd8 into microsoft:master Feb 3, 2024
43 checks passed
This was referenced May 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants