You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
sudo docker run --gpus all -p 8080:80 -v ./volume2:/data --restart always -d ghcr.io/huggingface/text-embeddings-inference:turing-1.4 --model-id aari1995/German_Semantic_V3 --pooling mean --dtype float16 --max-client-batch-size 256 --max-batch-tokens 16384
Information
Docker
The CLI directly
Tasks
An officially supported command
My own modifications
Reproduction
Start a docker container with the following model https://huggingface.co/aari1995/German_Semantic_V3
I also tried to experiment with the architectures and trust remote code on sentence bert but it keeps routing to the bert model
{"timestamp":"2024-08-14T08:04:17.082662Z","level":"INFO","message":"Args { model_id: "/rep****ory", revision: None, tokenization_workers: None, dtype: None, pooling: None, max_concurrent_requests: 512, max_batch_tokens: 16384, max_batch_requests: None, max_client_batch_size: 32, auto_truncate: false, default_prompt_name: None, default_prompt: None, hf_api_token: None, hostname: "r-stefanraab-german-semantic-v3-znq-ffyjb6zd-101c8-dneb1", port: 80, uds_path: "/tmp/text-embeddings-inference-server", huggingface_hub_cache: Some("/repository/cache"), payload_limit: 2000000, api_key: None, json_output: true, otlp_endpoint: None, otlp_service_name: "text-embeddings-inference.server", cors_allow_origin: None }","target":"text_embeddings_router","filename":"router/src/main.rs","line_number":175}
{"timestamp":"2024-08-14T08:04:17.095519Z","level":"INFO","message":"Maximum number of tokens per request: 8192","target":"text_embeddings_router","filename":"router/src/lib.rs","line_number":199}
{"timestamp":"2024-08-14T08:04:17.095687Z","level":"INFO","message":"Starting 2 tokenization workers","target":"text_embeddings_core::tokenization","filename":"core/src/tokenization.rs","line_number":26}
{"timestamp":"2024-08-14T08:04:17.109235Z","level":"INFO","message":"Starting model backend","target":"text_embeddings_router","filename":"router/src/lib.rs","line_number":250}
{"timestamp":"2024-08-14T08:04:17.296077Z","level":"INFO","message":"Starting Bert model on Cuda(CudaDevice(DeviceId(1)))","target":"text_embeddings_backend_candle","filename":"backends/candle/src/lib.rs","line_number":268}
Error: Could not create backend
Caused by:
Could not start backend: Bert only supports absolute position embeddings
Expected behavior
I would expect that, like the Base Jina Model, it would be routed to the Jinabert model, which would allow alibi as a type. Instead, it gets routed to a classical Bert Model.
The text was updated successfully, but these errors were encountered:
System Info
TEI Inference Docker 1.4 , Cuda 12.2 , Nvidia T4
sudo docker run --gpus all -p 8080:80 -v ./volume2:/data --restart always -d ghcr.io/huggingface/text-embeddings-inference:turing-1.4 --model-id aari1995/German_Semantic_V3 --pooling mean --dtype float16 --max-client-batch-size 256 --max-batch-tokens 16384
Information
Tasks
Reproduction
Start a docker container with the following model https://huggingface.co/aari1995/German_Semantic_V3
I also tried to experiment with the architectures and trust remote code on sentence bert but it keeps routing to the bert model
{"timestamp":"2024-08-14T08:04:17.082662Z","level":"INFO","message":"Args { model_id: "/rep****ory", revision: None, tokenization_workers: None, dtype: None, pooling: None, max_concurrent_requests: 512, max_batch_tokens: 16384, max_batch_requests: None, max_client_batch_size: 32, auto_truncate: false, default_prompt_name: None, default_prompt: None, hf_api_token: None, hostname: "r-stefanraab-german-semantic-v3-znq-ffyjb6zd-101c8-dneb1", port: 80, uds_path: "/tmp/text-embeddings-inference-server", huggingface_hub_cache: Some("/repository/cache"), payload_limit: 2000000, api_key: None, json_output: true, otlp_endpoint: None, otlp_service_name: "text-embeddings-inference.server", cors_allow_origin: None }","target":"text_embeddings_router","filename":"router/src/main.rs","line_number":175}
{"timestamp":"2024-08-14T08:04:17.095519Z","level":"INFO","message":"Maximum number of tokens per request: 8192","target":"text_embeddings_router","filename":"router/src/lib.rs","line_number":199}
{"timestamp":"2024-08-14T08:04:17.095687Z","level":"INFO","message":"Starting 2 tokenization workers","target":"text_embeddings_core::tokenization","filename":"core/src/tokenization.rs","line_number":26}
{"timestamp":"2024-08-14T08:04:17.109235Z","level":"INFO","message":"Starting model backend","target":"text_embeddings_router","filename":"router/src/lib.rs","line_number":250}
{"timestamp":"2024-08-14T08:04:17.296077Z","level":"INFO","message":"Starting Bert model on Cuda(CudaDevice(DeviceId(1)))","target":"text_embeddings_backend_candle","filename":"backends/candle/src/lib.rs","line_number":268}
Error: Could not create backend
Caused by:
Could not start backend: Bert only supports absolute position embeddings
Expected behavior
I would expect that, like the Base Jina Model, it would be routed to the Jinabert model, which would allow alibi as a type. Instead, it gets routed to a classical Bert Model.
The text was updated successfully, but these errors were encountered: