TEI fails for Finetuned JinaAI Embeddings models #384

StefanRaab · 2024-08-15T05:56:16Z

System Info

TEI Inference Docker 1.4 , Cuda 12.2 , Nvidia T4

sudo docker run --gpus all -p 8080:80 -v ./volume2:/data --restart always -d ghcr.io/huggingface/text-embeddings-inference:turing-1.4 --model-id aari1995/German_Semantic_V3 --pooling mean --dtype float16 --max-client-batch-size 256 --max-batch-tokens 16384

Information

Docker
The CLI directly

Tasks

An officially supported command
My own modifications

Reproduction

Start a docker container with the following model https://huggingface.co/aari1995/German_Semantic_V3
I also tried to experiment with the architectures and trust remote code on sentence bert but it keeps routing to the bert model

{"timestamp":"2024-08-14T08:04:17.082662Z","level":"INFO","message":"Args { model_id: "/rep****ory", revision: None, tokenization_workers: None, dtype: None, pooling: None, max_concurrent_requests: 512, max_batch_tokens: 16384, max_batch_requests: None, max_client_batch_size: 32, auto_truncate: false, default_prompt_name: None, default_prompt: None, hf_api_token: None, hostname: "r-stefanraab-german-semantic-v3-znq-ffyjb6zd-101c8-dneb1", port: 80, uds_path: "/tmp/text-embeddings-inference-server", huggingface_hub_cache: Some("/repository/cache"), payload_limit: 2000000, api_key: None, json_output: true, otlp_endpoint: None, otlp_service_name: "text-embeddings-inference.server", cors_allow_origin: None }","target":"text_embeddings_router","filename":"router/src/main.rs","line_number":175}
{"timestamp":"2024-08-14T08:04:17.095519Z","level":"INFO","message":"Maximum number of tokens per request: 8192","target":"text_embeddings_router","filename":"router/src/lib.rs","line_number":199}
{"timestamp":"2024-08-14T08:04:17.095687Z","level":"INFO","message":"Starting 2 tokenization workers","target":"text_embeddings_core::tokenization","filename":"core/src/tokenization.rs","line_number":26}
{"timestamp":"2024-08-14T08:04:17.109235Z","level":"INFO","message":"Starting model backend","target":"text_embeddings_router","filename":"router/src/lib.rs","line_number":250}
{"timestamp":"2024-08-14T08:04:17.296077Z","level":"INFO","message":"Starting Bert model on Cuda(CudaDevice(DeviceId(1)))","target":"text_embeddings_backend_candle","filename":"backends/candle/src/lib.rs","line_number":268}
Error: Could not create backend
Caused by:
Could not start backend: Bert only supports absolute position embeddings

Expected behavior

I would expect that, like the Base Jina Model, it would be routed to the Jinabert model, which would allow alibi as a type. Instead, it gets routed to a classical Bert Model.

kozistr · 2024-08-18T04:42:36Z

@StefanRaab TEI identifies the backend type using _name_or_path in config.json to differentiate between bert or jinabert (here the comment).

according to the source code, changing your model's _name_or_path to jinaai/jina-bert-implementation should work for now I guess.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TEI fails for Finetuned JinaAI Embeddings models #384

TEI fails for Finetuned JinaAI Embeddings models #384

StefanRaab commented Aug 15, 2024

kozistr commented Aug 18, 2024

TEI fails for Finetuned JinaAI Embeddings models #384

TEI fails for Finetuned JinaAI Embeddings models #384

Comments

StefanRaab commented Aug 15, 2024

System Info

Information

Tasks

Reproduction

Expected behavior

kozistr commented Aug 18, 2024