Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TEI fails for Finetuned JinaAI Embeddings models #384

Open
2 of 4 tasks
StefanRaab opened this issue Aug 15, 2024 · 1 comment
Open
2 of 4 tasks

TEI fails for Finetuned JinaAI Embeddings models #384

StefanRaab opened this issue Aug 15, 2024 · 1 comment

Comments

@StefanRaab
Copy link

System Info

TEI Inference Docker 1.4 , Cuda 12.2 , Nvidia T4

sudo docker run --gpus all -p 8080:80 -v ./volume2:/data --restart always -d ghcr.io/huggingface/text-embeddings-inference:turing-1.4 --model-id aari1995/German_Semantic_V3 --pooling mean --dtype float16 --max-client-batch-size 256 --max-batch-tokens 16384

Information

  • Docker
  • The CLI directly

Tasks

  • An officially supported command
  • My own modifications

Reproduction

Start a docker container with the following model https://huggingface.co/aari1995/German_Semantic_V3
I also tried to experiment with the architectures and trust remote code on sentence bert but it keeps routing to the bert model

{"timestamp":"2024-08-14T08:04:17.082662Z","level":"INFO","message":"Args { model_id: "/rep****ory", revision: None, tokenization_workers: None, dtype: None, pooling: None, max_concurrent_requests: 512, max_batch_tokens: 16384, max_batch_requests: None, max_client_batch_size: 32, auto_truncate: false, default_prompt_name: None, default_prompt: None, hf_api_token: None, hostname: "r-stefanraab-german-semantic-v3-znq-ffyjb6zd-101c8-dneb1", port: 80, uds_path: "/tmp/text-embeddings-inference-server", huggingface_hub_cache: Some("/repository/cache"), payload_limit: 2000000, api_key: None, json_output: true, otlp_endpoint: None, otlp_service_name: "text-embeddings-inference.server", cors_allow_origin: None }","target":"text_embeddings_router","filename":"router/src/main.rs","line_number":175}
{"timestamp":"2024-08-14T08:04:17.095519Z","level":"INFO","message":"Maximum number of tokens per request: 8192","target":"text_embeddings_router","filename":"router/src/lib.rs","line_number":199}
{"timestamp":"2024-08-14T08:04:17.095687Z","level":"INFO","message":"Starting 2 tokenization workers","target":"text_embeddings_core::tokenization","filename":"core/src/tokenization.rs","line_number":26}
{"timestamp":"2024-08-14T08:04:17.109235Z","level":"INFO","message":"Starting model backend","target":"text_embeddings_router","filename":"router/src/lib.rs","line_number":250}
{"timestamp":"2024-08-14T08:04:17.296077Z","level":"INFO","message":"Starting Bert model on Cuda(CudaDevice(DeviceId(1)))","target":"text_embeddings_backend_candle","filename":"backends/candle/src/lib.rs","line_number":268}
Error: Could not create backend
Caused by:
Could not start backend: Bert only supports absolute position embeddings

Expected behavior

I would expect that, like the Base Jina Model, it would be routed to the Jinabert model, which would allow alibi as a type. Instead, it gets routed to a classical Bert Model.

@kozistr
Copy link
Contributor

kozistr commented Aug 18, 2024

@StefanRaab TEI identifies the backend type using _name_or_path in config.json to differentiate between bert or jinabert (here the comment).

according to the source code, changing your model's _name_or_path to jinaai/jina-bert-implementation should work for now I guess.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants