Why is the implementation of Mamba so slow? #21

baiSongL · 2024-02-29T10:40:05Z

I haven't run the official version of Mamba, but I've run your implementation, and it seems that the training speed of this model is much slower than that of the Transformer.

aryanmangal769 · 2024-03-02T22:24:50Z

I am also facing the similar issue. Are you able to find the reason for it?

lingxitong · 2024-03-09T11:05:09Z

that is mainly because there is something wrong which lead the loss.backward so slow

johnma2006 · 2024-03-26T15:14:30Z

It's not meant to be fast! This repo is mostly only for educational purposes. I would suggest using the official repo to do any training: https://github.com/state-spaces/mamba

D-Walter · 2024-04-22T03:50:50Z

probably because mamba-ssm designed a GPU-adapted scanning operation based on C.

mengdeyu539 · 2024-09-19T05:57:15Z

I think the problem is embeding layer. D-layer is 50280, lol.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why is the implementation of Mamba so slow? #21

Why is the implementation of Mamba so slow? #21

baiSongL commented Feb 29, 2024

aryanmangal769 commented Mar 2, 2024

lingxitong commented Mar 9, 2024

johnma2006 commented Mar 26, 2024

D-Walter commented Apr 22, 2024

mengdeyu539 commented Sep 19, 2024

Why is the implementation of Mamba so slow? #21

Why is the implementation of Mamba so slow? #21

Comments

baiSongL commented Feb 29, 2024

aryanmangal769 commented Mar 2, 2024

lingxitong commented Mar 9, 2024

johnma2006 commented Mar 26, 2024

D-Walter commented Apr 22, 2024

mengdeyu539 commented Sep 19, 2024