Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature: support bm42 embeddings #338

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

cdxker
Copy link

@cdxker cdxker commented Jul 10, 2024

What does this PR do?

This pr adds support for Qdrant/all_miniLM_L6_v2_with_attentions I followed https://github.com/Anush008/bm42-rs as a reference and added a new pooling method of bm42.

Tested with running with command

HF_TOKEN=hf_********************************** cargo run --no-default-features -F http -F ort -- --model-id Qdrant/all_miniLM_L6_v2_with_attentions --pooling bm42 --port 5888

Fixes #337

Before submitting

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@OlivierDehaene
Copy link
Member

BM42 is super controversial.
See https://x.com/Nils_Reimers/status/1809334134088622217
I'm on the side of waiting a bit to see how this evolve.

@OlivierDehaene
Copy link
Member

To me Qdrant came at as a prety shady actor here so I would take everything they publish with a big grain of salt.

@cdxker
Copy link
Author

cdxker commented Jul 11, 2024

To me Qdrant came at as a prety shady actor here so I would take everything they publish with a big grain of salt.

Fair. We really just want support for it so we could test on our own datasets and verify how good/bad it actually is. I agree a lot of that launch was shady; especially the quickwit bench portion.

Any feedback on the actual implementation? Because the router function async fn embed_sparse expects a dense vector then sparsifies it is making it super expensive to serve a lot of sparse embedding models. In the case of this model it uses a murmur3 hash which has a very large key Space.

Which means that Infer has to create and send a very large dense vector across a channel. https://github.com/devflowinc/text-embeddings-inference/blob/dc62b26f04394c44aa3469c10ba8880d3076e087/backends/ort/src/bm42.rs#L293-L296

https://github.com/devflowinc/text-embeddings-inference/blob/dc62b26f04394c44aa3469c10ba8880d3076e087/core/src/infer.rs#L193-L204

This will impact adding support for other sparse models as well as this model. It would be a lot better if Infer::embed_sparse either:

  1. Doesn't call the embed function and have each Backend have an embed_sparse function themselves.
  2. Have Infer::embed_spasrse sparsify the vector before it gets sent back across the channel

I don't have enough context on what the best approach for this is, but it would open the opportunity for supporting other sparse embedding models that have larger keyspaces.

@OlivierDehaene
Copy link
Member

Which means that Infer has to create and send a very large dense vector across a channel.

Why is that a problem? Moving a Vec shouldn't be a concern, it's not expensive as far as I know.
But yes, having a dedicated embed_sparse function in the backend trait could be interesting.

@cdxker
Copy link
Author

cdxker commented Jul 16, 2024

Creating a Vector with a max of 9223372036854776000 f32's is pretty expensive. In this case, even if they are all 0 filled

@OlivierDehaene
Copy link
Member

Creating a Vector with a max of 9223372036854776000 f32's is pretty expensive. In this case, even if they are all 0 filled

Ok so when you say "Which means that Infer has to create and send a very large dense vector across a channel." you are only referring to the creation part? As I said the moving part ("send accross a channel") is not an issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support for Qdrant bm42 document encoder
2 participants