Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add non-persistent fp8 triton_rowwise kernel #2484

Closed
wants to merge 1 commit into from

Conversation

karthik-man
Copy link
Contributor

Summary:
X-link: pytorch/FBGEMM#3212

X-link: https://github.com/facebookresearch/FBGEMM/pull/308

triton_rowwise persistent kernel performs poorly on MI300 compared to the non-persistent kernel, when both are run with exhaustive AMD-specific tuning.

Reviewed By: htyu

Differential Revision: D63741099

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D63741099

karthik-man added a commit to karthik-man/FBGEMM that referenced this pull request Oct 2, 2024
Summary:
X-link: pytorch/benchmark#2484

Pull Request resolved: pytorch#3212

X-link: facebookresearch/FBGEMM#308

 triton_rowwise persistent kernel performs poorly on MI300 compared to the non-persistent kernel, when both are run with exhaustive AMD-specific tuning.

Reviewed By: htyu

Differential Revision: D63741099
Summary:

X-link: pytorch/FBGEMM#3212

X-link: facebookresearch/FBGEMM#308

 triton_rowwise persistent kernel performs poorly on MI300 compared to the non-persistent kernel, when both are run with exhaustive AMD-specific tuning.

Reviewed By: htyu

Differential Revision: D63741099
karthik-man added a commit to karthik-man/FBGEMM that referenced this pull request Oct 3, 2024
Summary:
X-link: pytorch/benchmark#2484


X-link: facebookresearch/FBGEMM#308

 triton_rowwise persistent kernel performs poorly on MI300 compared to the non-persistent kernel, when both are run with exhaustive AMD-specific tuning.

Reviewed By: htyu

Differential Revision: D63741099
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D63741099

facebook-github-bot pushed a commit to pytorch/FBGEMM that referenced this pull request Oct 3, 2024
Summary:
X-link: pytorch/benchmark#2484

Pull Request resolved: #3212

X-link: facebookresearch/FBGEMM#308

 triton_rowwise persistent kernel performs poorly on MI300 compared to the non-persistent kernel, when both are run with exhaustive AMD-specific tuning.

Reviewed By: htyu

Differential Revision: D63741099

fbshipit-source-id: c276415ddf8f5d24ffeba70b8ee6493011b393e1
@facebook-github-bot
Copy link
Contributor

This pull request has been merged in 6b4f339.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants