Improve out-of-box performance for GEMM kernels variants #2379

etiotto · 2024-09-27T18:05:41Z

We have achieved good performance (relative to the XeTLA library) for a GEMM kernel (see http://benchmarks.glados.intel.com/d/1pXX4hUSz/microbenchmarks?orgId=1). Now is time to focus on improving performance of several variants of the GEMM workload:

Work Items

The text was updated successfully, but these errors were encountered:

etiotto · 2024-09-27T18:09:08Z

etiotto · 2024-10-02T17:01:42Z

For GEMM + matrix add (postOp), PR #2400 improves performance from ~66TFlops to ~215TFlops for a 8Kx8Kx8K shape (other shapes also improve).

etiotto added the umbrella label Sep 27, 2024

vlad-penkin added this to the 4.0 [Performance] Core milestone Sep 30, 2024

vlad-penkin added the performance label Sep 30, 2024

vlad-penkin assigned etiotto Sep 30, 2024

vlad-penkin added codegen: gemm enhancement New feature or request labels Sep 30, 2024

etiotto closed this as completed Oct 2, 2024

etiotto reopened this Oct 2, 2024

Provide feedback