Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why is the implementation of Mamba so slow? #21

Open
baiSongL opened this issue Feb 29, 2024 · 5 comments
Open

Why is the implementation of Mamba so slow? #21

baiSongL opened this issue Feb 29, 2024 · 5 comments

Comments

@baiSongL
Copy link

I haven't run the official version of Mamba, but I've run your implementation, and it seems that the training speed of this model is much slower than that of the Transformer.

@aryanmangal769
Copy link

I am also facing the similar issue. Are you able to find the reason for it?

@lingxitong
Copy link

that is mainly because there is something wrong which lead the loss.backward so slow

@johnma2006
Copy link
Owner

It's not meant to be fast! This repo is mostly only for educational purposes. I would suggest using the official repo to do any training: https://github.com/state-spaces/mamba

@D-Walter
Copy link

probably because mamba-ssm designed a GPU-adapted scanning operation based on C.

@mengdeyu539
Copy link

I think the problem is embeding layer. D-layer is 50280, lol.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants