Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance of tt.load and tt.store for FP8 when converting block ptr to regular ptrs #2374

Open
etiotto opened this issue Sep 27, 2024 · 0 comments

Comments

@etiotto
Copy link
Contributor

etiotto commented Sep 27, 2024

We would like to remove the RewriteTensorPointer pass which rewrites block pointers into regular pointers (except when it determines load/store operations on block ptrs can be converted to 2D block reads/writes). The idea is to avoid loosing semantic information too early and instead deal with block ptr that cannot be used to generate 2D block reads/stores while lowering that operation).

For this scheme to work, we first need to improve the lowering code for tt.load and tt.store operations that use a block ptr with an element type that is not (currently) supported by the 2D read instructions available on the target GPU (e.g. the element is FP8).

See #2359 (comment) for more context.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants