Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build integrated OpenCL Linux wheels #5252

Merged
merged 19 commits into from
Dec 2, 2022

Conversation

jgiannuzzi
Copy link
Contributor

This PR builds on @tpboudreau's excellent work in #3144 to add support for building integrated OpenCL wheels on Linux too. It also changes the CI to build this wheel instead of the regular CPU wheel, offering a better out-of-the-box experience for Linux users who want to use LightGBM with GPU support.

Fixes #4684

Copy link
Collaborator

@jameslamb jameslamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much for working on this!

I personally am not very familiar with OpenCL and not qualified to review this. Hopefully @shiyu1994 @guolinke and @StrikerRUS will be able to provide you some feedback. I'm also pinging @huanzhang12 to possibly help.

tests/python_package_test/conftest.py Outdated Show resolved Hide resolved
@StrikerRUS
Copy link
Collaborator

@jgiannuzzi
Thank you so much for working on this and for amazing PR!

I personally am not very familiar with OpenCL and not qualified to review this.

Unfortunately, me too. But fortunately, I have an easy access to Ubuntu machine with NVIDIA GPU. So, I'll be able to independently test generated artifacts.

.vsts-ci.yml Outdated
Comment on lines 48 to 49
# on Ubuntu 14.04, test_dual.py fails with newer version of Python
PYTHON_VERSION: '3.7'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please share the exact error message?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the same error that we get when trying to do gpu-source with Python 3.8 on Ubuntu 14.04

CMakeLists.txt Show resolved Hide resolved
@StrikerRUS
Copy link
Collaborator

@tpboudreau @itamarst We'll really appreciate your inputs for this PR!

@StrikerRUS
Copy link
Collaborator

@jgiannuzzi Please update docs: https://github.com/microsoft/LightGBM/pull/3660/files#diff-e14b183376b1177323f6e7245d8ad64f2cd26f9638c129769d6b3c0ba4698dd5R26.

.ci/test.sh Show resolved Hide resolved
.ci/test.sh Show resolved Hide resolved
@jgiannuzzi
Copy link
Contributor Author

@StrikerRUS I have addressed your remaining comments.

@tpboudreau Could you please take a look? For the context, #5282 got merged before this one, switching the OpenCL CPU implementation to PoCL on Linux.

@StrikerRUS
Copy link
Collaborator

We've recently moved PoCL installation from the CI runtime (.ci/setup.sh) to the Docker creation phase with the aim to reduce overall CI time: guolinke/lightgbm-ci-docker#26 and #5286. That process introduced some merge conflicts in this PR. Sorry about that! Could you please resolve those conflicts?

@StrikerRUS
Copy link
Collaborator

Good news!

I just checked generated as CI artifact wheel file on my Ubuntu machine with NVIDIA GPU. I simply installed it with pip install *.whl and after that device='gpu' parameter allowed me to utilize my GPU.

>>> Python 3.7.12 | packaged by conda-forge | (default, Oct 26 2021, 06:08:53)
>>> [GCC 9.4.0] on linux

import numpy as np
import lightgbm as lgb

X = np.random.random((10_000, 200))
y = np.random.random(10_000)

est = lgb.LGBMRegressor(n_estimators=5000).fit(X, y)
>>> [LightGBM] [Warning] Auto-choosing col-wise multi-threading, the overhead of testing was 0.009667 seconds.
>>> You can set `force_col_wise=true` to remove the overhead.
>>> [LightGBM] [Info] Total Bins 51000
>>> [LightGBM] [Info] Number of data points in the train set: 10000, number of used features: 200
>>> [LightGBM] [Info] Start training from score 0.502931

est = lgb.LGBMRegressor(n_estimators=5000, device='gpu').fit(X, y)
>>> [LightGBM] [Info] This is the GPU trainer!!
>>> [LightGBM] [Info] Total Bins 51000
>>> [LightGBM] [Info] Number of data points in the train set: 10000, number of used features: 200
>>> [LightGBM] [Info] Using GPU Device: Tesla V100-SXM2-32GB, Vendor: NVIDIA Corporation
>>> [LightGBM] [Info] Compiling OpenCL Kernel with 256 bins...
>>> [LightGBM] [Info] GPU programs have been built
>>> [LightGBM] [Info] Size of histogram bin entry: 8
>>> [LightGBM] [Info] 200 dense feature groups (1.91 MB) transferred to GPU in 0.004207 secs. 0 sparse feature groups
>>> [LightGBM] [Info] Start training from score 0.502931

image

@StrikerRUS
Copy link
Collaborator

@jgiannuzzi Hey! Maybe we can merge this PR without the support for aarch64 and add it later in a separate PR?

@jgiannuzzi
Copy link
Contributor Author

Hey @StrikerRUS, I'm very sorry for not having updated this PR yet. I have had a fix for aarch64 for a while but never had the time to update the PR. I will try to do it this week so we can finally have those integrate OpenCL Linux wheels!

@StrikerRUS
Copy link
Collaborator

@jgiannuzzi No problem. Thanks a lot for all your hard work!

@jameslamb
Copy link
Collaborator

I tried building this in CI last night (just pushing this branch from a fork to a LightGBM branch so it'd produce artifacts I could download and test with), and unfortunately I saw one failure on the QEMU_multiarch bdist build.

The test_cpu_and_gpu_work() test failed with the following.

lightgbm.basic.LightGBMError: No OpenCL device found

(build link)

I just pushed a9f02d7 updating this to latest master. Let's see if that happens here.

I'm so sorry @jgiannuzzi , I'm trying to test this but LightGBM's CI is not in a good state right now. I'm trying to fix it as fast as I can.

@jameslamb
Copy link
Collaborator

I realized that in the new manylinux_2_28_x86_64 image used to build x86_64 linux wheels, a too-old OpenCL was being found, and as a result pocl didn't register itself in /etc/OpenCL/vendors.

Fixed that in guolinke/lightgbm-ci-docker#29, which looks like it fixed #5252 (comment).

(ci build link)

I'll try fixing the other two CI failures tonight, they're almost certainly my fault. Sorry about that @jgiannuzzi , I'm doing my best to get this merged.

@guolinke
Copy link
Collaborator

I am not sure why the "guolinke/lightgbm-ci-docker#29" will auto close this

@jameslamb
Copy link
Collaborator

I am not sure why the "guolinke/lightgbm-ci-docker#29" will auto close this

oh strange! Probably in the way I phrased "fix" in that PR's description. Sorry about that and thanks for re-opening it.

@jameslamb jameslamb self-requested a review December 1, 2022 01:23
Copy link
Collaborator

@jameslamb jameslamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright I've delayed this long enough, going to approve based on the passing CI and all I've learned about PoCL and this PR's other additions while working through #5580.

I'd still like to try these wheels again on different GPUs from a cloud provider, but that can be done later.

@guolinke can you look one more time? If you approve, please merge this (I'll be traveling for the next few days).

@jgiannuzzi thank you SO MUCH for this awesome contribution, and the other excellent contributions you've made to LightGBM along the way. I've learned a lot from you and it's been great working with you. I hope you'll consider contributing more to LightGBM in the future 😁

Copy link
Collaborator

@guolinke guolinke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@jgiannuzzi
Copy link
Contributor Author

Thank you @jameslamb and @guolinke, I'm looking forward to having daily builds with GPU support!

I'm sorry I wasn't available to help earlier with the many CI woes, and I'm certainly looking forward to contributing again to LightGBM in the future!

@github-actions
Copy link

This pull request has been automatically locked since there has not been any recent activity since it was closed.
To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues
including a reference to this.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 15, 2023
@jgiannuzzi jgiannuzzi deleted the linux-gpu-wheel branch August 16, 2023 08:57
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Build Python wheels that support both GPU and CPU versions out of the box for non-Windows
6 participants