Assertion error on `gemm_splitk_benchmark.py` #2377

etiotto · 2024-09-27T17:36:15Z

USE_IPEX=0 python gemm_splitk_benchmark.py

/home/jovyan/.conda/envs/triton-3.10/lib/python3.10/site-packages/triton_kernels_benchmark-0.0.0-py3.10.egg/triton_kernels_benchmark/benchmark_testing.py:25: UserWarning: std(): degrees of freedom is <= 0. Correction should be strictly less than the reduction factor (input numel divided by output numel). (Triggered internally at /runner/_work/intel-xpu-backend-for-triton/intel-xpu-backend-for-triton/pytorch/aten/src/ATen/native/ReduceOps.cpp:1823.)
 std = torch.std(times)
/home/jovyan/.conda/envs/triton-3.10/lib/python3.10/site-packages/triton_kernels_benchmark-0.0.0-py3.10.egg/triton_kernels_benchmark/benchmark_testing.py:25: UserWarning: std(): degrees of freedom is <= 0. Correction should be strictly less than the reduction factor (input numel divided by output numel). (Triggered internally at /runner/_work/intel-xpu-backend-for-triton/intel-xpu-backend-for-triton/pytorch/aten/src/ATen/native/ReduceOps.cpp:1823.)
 std = torch.std(times)
Traceback (most recent call last):
 File "/home/jovyan/intel-xpu-backend-for-triton/benchmarks/triton_kernels_benchmark/gemm_splitk_benchmark.py", line 172, in <module>
   benchmark.run(show_plots=False, print_data=True)
 File "/home/jovyan/.conda/envs/triton-3.10/lib/python3.10/site-packages/triton_kernels_benchmark-0.0.0-py3.10.egg/triton_kernels_benchmark/benchmark_testing.py", line 373, in run
   result_dfs.append(self._run(bench, save_path, show_plots, print_data, **kwargs))
 File "/home/jovyan/.conda/envs/triton-3.10/lib/python3.10/site-packages/triton_kernels_benchmark-0.0.0-py3.10.egg/triton_kernels_benchmark/benchmark_testing.py", line 307, in _run
   ret = self.fn(**x_args, **{bench.line_arg: y}, **bench.args, **kwrags)
 File "/home/jovyan/intel-xpu-backend-for-triton/benchmarks/triton_kernels_benchmark/gemm_splitk_benchmark.py", line 159, in benchmark
   benchmark_suit.assert_close(triton_fn(), torch_fn(), atol=1e-4, rtol=rtol, err_msg='triton to torch')
 File "/home/jovyan/.conda/envs/triton-3.10/lib/python3.10/site-packages/triton_kernels_benchmark-0.0.0-py3.10.egg/triton_kernels_benchmark/benchmark_testing.py", line 190, in assert_close
   np.testing.assert_allclose(x, y, atol=atol, rtol=rtol, equal_nan=True)
 File "/home/jovyan/.conda/envs/triton-3.10/lib/python3.10/site-packages/numpy/testing/_private/utils.py", line 1504, in assert_allclose
   assert_array_compare(compare, actual, desired, err_msg=str(err_msg),
 File "/home/jovyan/.conda/envs/triton-3.10/lib/python3.10/contextlib.py", line 79, in inner
   return func(*args, **kwds)
 File "/home/jovyan/.conda/envs/triton-3.10/lib/python3.10/site-packages/numpy/testing/_private/utils.py", line 797, in assert_array_compare
   raise AssertionError(msg)
AssertionError: 
Not equal to tolerance rtol=0.01, atol=0.0001

Mismatched elements: 10485760 / 16777216 (62.5%)
Max absolute difference: 14077.262
Max relative difference: 14.093543
x: array([[14117.621  , 14472.084  , 14199.322  , ..., 14278.562  ,
       14391.052  , 14581.361  ],
      [14417.741  , 14243.687  , 13900.123  , ..., 14021.29   ,...
y: array([[ 992., 1020., 1004., ..., 1008., 1016., 1016.],
      [ 984., 1000.,  980., ...,  988.,  996., 1004.],
      [ 992., 1016.,  984., ..., 1008., 1016., 1012.],...
      ```

The text was updated successfully, but these errors were encountered:

etiotto · 2024-10-01T20:40:03Z

Took #2378 from @LiyangLingIntel because is related to #2374 (which I currently own). Giving @LiyangLingIntel this one as it is related to the streamk implementation he worked on.

etiotto · 2024-10-02T16:14:03Z

I believe this fails only for 4Kx4Kx4K shapes, reducing priority and differing it to "make room" for other more important work items.

LiyangLingIntel · 2024-10-08T06:43:17Z

I believe this fails only for 4Kx4Kx4K shapes, reducing priority and differing it to "make room" for other more important work items.

In my local test, it works for USE_IPEX=1 python gemm_splitk_benchmark.py. We can do further investigation on the diff between IPEX and upstream Pytorch GEMM implementation for XPU.
I agree we can reduce the priority for this issue and move back when other important work items are done.

etiotto assigned LiyangLingIntel Sep 27, 2024

vlad-penkin added this to the 1.0 [UT and Tutorials][Triton 3.0] Pass rate milestone Sep 30, 2024

vlad-penkin added bug Something isn't working tests: ut labels Sep 30, 2024

vlad-penkin assigned etiotto and unassigned LiyangLingIntel Sep 30, 2024

etiotto mentioned this issue Sep 30, 2024

Improve out-of-box performance for GEMM kernels variants #2379

Open

etiotto assigned LiyangLingIntel Oct 1, 2024

etiotto removed their assignment Oct 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Assertion error on `gemm_splitk_benchmark.py` #2377

Assertion error on `gemm_splitk_benchmark.py` #2377

etiotto commented Sep 27, 2024 •

edited

Loading

etiotto commented Oct 1, 2024

etiotto commented Oct 2, 2024

LiyangLingIntel commented Oct 8, 2024

Assertion error on gemm_splitk_benchmark.py #2377

Assertion error on gemm_splitk_benchmark.py #2377

Comments

etiotto commented Sep 27, 2024 • edited Loading

etiotto commented Oct 1, 2024

etiotto commented Oct 2, 2024

LiyangLingIntel commented Oct 8, 2024

Assertion error on `gemm_splitk_benchmark.py` #2377

Assertion error on `gemm_splitk_benchmark.py` #2377

etiotto commented Sep 27, 2024 •

edited

Loading