Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistency in how different URL paths are handled (in inference endpoints) #398

Open
4 tasks
MoritzLaurer opened this issue Sep 4, 2024 · 0 comments
Open
4 tasks

Comments

@MoritzLaurer
Copy link

System Info

Inference endpoints
TEI version 1.5

Information

  • Docker
  • The CLI directly

Tasks

  • An officially supported command
  • My own modifications

Reproduction

Create an inference endpoint with a TEI container for reranking with a reranking model:

from huggingface_hub import create_inference_endpoint


repository = "BAAI/bge-reranker-base"  #"BAAI/bge-reranker-large-base"
endpoint_name = "bge-reranker-large-base-05"
namespace = "MoritzLaurer"  # your user or organization name


# check if endpoint with this name already exists from previous tests
available_endpoints_names = [endpoint.name for endpoint in huggingface_hub.list_inference_endpoints()]
if endpoint_name in available_endpoints_names:
    endpoint_exists = True
else: 
    endpoint_exists = False
print("Does the endpoint already exist?", endpoint_exists)
    

# create new endpoint
if not endpoint_exists:
    endpoint = create_inference_endpoint(
        endpoint_name,
        repository=repository,
        namespace=namespace,
        framework="pytorch",
        task="sentence-ranking",
        # see the available hardware options here: https://huggingface.co/docs/inference-endpoints/pricing#pricing
        accelerator="gpu",
        vendor="aws",
        region="us-east-1",
        instance_size="x1",
        instance_type="nvidia-a10g",
        min_replica=0,
        max_replica=1,
        type="protected",
        custom_image={
            "health_route":"/health",
            "env": {
                "MAX_BATCH_TOKENS":"16384",
                "MAX_CONCURRENT_REQUESTS":"512",
                "MAX_BATCH_REQUESTS": "160",
                "MODEL_ID":"/repository"
            },
            "url":"ghcr.io/huggingface/text-embeddings-inference:latest"
        },
    )
    print("Waiting for endpoint to be created")
    endpoint.wait()
    print("Endpoint ready")

# if endpoint with this name already exists, get existing endpoint
else:
    endpoint = huggingface_hub.get_inference_endpoint(name=endpoint_name, namespace=namespace)
    if endpoint.status in ["paused", "scaledToZero"]:
        print("Resuming endpoint")
        endpoint.resume()
    print("Waiting for endpoint to start")
    endpoint.wait()
    print("Endpoint ready")

Send request both with /rerank path appended to endpoint.url or without:

import requests

HEADERS = {"Authorization": f"Bearer {huggingface_hub.get_token()}"}
API_URL = endpoint.url + "/rerank"

# function for standard http requests
def query(payload=None, api_url=None):
    response = requests.post(api_url, headers=HEADERS, json=payload)
    return response.json()

output = query(
    payload = {
        "query":"What is Deep Learning?", 
        "texts": ["Deep Learning is not...", "Deep learning is...", "testtest"]
    },
    
    api_url = API_URL
)

print(output)

In both cases the it get the same and correct reranking output.

On the other hand, when I create this endpoint for sentence-similarity with an embedding model:

from huggingface_hub import create_inference_endpoint


repository = "thenlper/gte-large"  #"BAAI/bge-reranker-large-base"
endpoint_name = "gte-large-001"
namespace = "MoritzLaurer"  # your user or organization name


# check if endpoint with this name already exists from previous tests
available_endpoints_names = [endpoint.name for endpoint in huggingface_hub.list_inference_endpoints()]
if endpoint_name in available_endpoints_names:
    endpoint_exists = True
else: 
    endpoint_exists = False
print("Does the endpoint already exist?", endpoint_exists)
    

# create new endpoint
if not endpoint_exists:
    endpoint = create_inference_endpoint(
        endpoint_name,
        repository=repository,
        namespace=namespace,
        framework="pytorch",
        task="sentence-similarity",
        # see the available hardware options here: https://huggingface.co/docs/inference-endpoints/pricing#pricing
        accelerator="gpu",
        vendor="aws",
        region="us-east-1",
        instance_size="x1",
        instance_type="nvidia-a10g",
        min_replica=2,
        max_replica=4,
        type="protected",
        custom_image={
            "health_route":"/health",
            "env": {
                "MAX_BATCH_TOKENS":"16384",
                "MAX_CONCURRENT_REQUESTS":"512",
                "MAX_BATCH_REQUESTS": "124",
                "MODEL_ID": "/repository"},
            "url":"ghcr.io/huggingface/text-embeddings-inference:latest"
        }
    )
    print("Waiting for endpoint to be created")
    endpoint.wait()
    print("Endpoint ready")

# if endpoint with this name already exists, get existing endpoint
else:
    endpoint = huggingface_hub.get_inference_endpoint(name=endpoint_name, namespace=namespace)
    if endpoint.status in ["paused", "scaledToZero"]:
        print("Resuming endpoint")
        endpoint.resume()
    print("Waiting for endpoint to start")
    endpoint.wait()
    print("Endpoint ready")

Then I need to append the /similarity route at the end of the URL get correct outputs.

import requests

API_URL = endpoint.url + "/similarity"  #"https://c5hhcabur7dqwyj7.us-east-1.aws.endpoints.huggingface.cloud" + "/similarity"
headers = {
	"Accept" : "application/json",
	"Authorization": f"Bearer {huggingface_hub.get_token()}",
	"Content-Type": "application/json" 
}

def query(payload):
	response = requests.post(API_URL, headers=headers, json=payload)
	return response.json()

output = query({
	"inputs": {"sentences": [
		"That is a happy dog",
		"That is a very happy person",
		"Today is a sunny day"
	],
	"source_sentence": "That is a happy person",
	"parameters": {}}
})

output
#[0.91960955, 0.98106885, 0.8241128]

If I don't manually append /similarity to the URL, I get the following error:

---------------------------------------------------------------------------
JSONDecodeError                           Traceback (most recent call last)
File ~/miniconda/lib/python3.9/site-packages/requests/models.py:974, in Response.json(self, **kwargs)
    973 try:
--> 974     return complexjson.loads(self.text, **kwargs)
    975 except JSONDecodeError as e:
    976     # Catch JSON-related errors and raise as requests.JSONDecodeError
    977     # This aliases json.JSONDecodeError and simplejson.JSONDecodeError

File ~/miniconda/lib/python3.9/json/__init__.py:346, in loads(s, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
    343 if (cls is None and object_hook is None and
    344         parse_int is None and parse_float is None and
    345         parse_constant is None and object_pairs_hook is None and not kw):
--> 346     return _default_decoder.decode(s)
    347 if cls is None:

File ~/miniconda/lib/python3.9/json/decoder.py:337, in JSONDecoder.decode(self, s, _w)
    333 """Return the Python representation of ``s`` (a ``str`` instance
    334 containing a JSON document).
    335 
    336 """
--> 337 obj, end = self.raw_decode(s, idx=_w(s, 0).end())
    338 end = _w(s, end).end()

File ~/miniconda/lib/python3.9/json/decoder.py:355, in JSONDecoder.raw_decode(self, s, idx)
    354 except StopIteration as err:
--> 355     raise JSONDecodeError("Expecting value", s, err.value) from None
    356 return obj, end

JSONDecodeError: Expecting value: line 1 column 1 (char 0)

During handling of the above exception, another exception occurred:

JSONDecodeError                           Traceback (most recent call last)
Cell In[17], line 14
     11 	response = requests.post(API_URL, headers=headers, json=payload)
     12 	return response.json()
---> 14 output = query({
     15 	"inputs": {"sentences": [
     16 		"That is a happy dog",
     17 		"That is a very happy person",
     18 		"Today is a sunny day"
     19 	],
     20 	"source_sentence": "That is a happy person",
     21 	"parameters": {}}
     22 })
     24 output

Cell In[17], line 12, in query(payload)
     10 def query(payload):
     11 	response = requests.post(API_URL, headers=headers, json=payload)
---> 12 	return response.json()

File ~/miniconda/lib/python3.9/site-packages/requests/models.py:978, in Response.json(self, **kwargs)
    974     return complexjson.loads(self.text, **kwargs)
    975 except JSONDecodeError as e:
    976     # Catch JSON-related errors and raise as requests.JSONDecodeError
    977     # This aliases json.JSONDecodeError and simplejson.JSONDecodeError
--> 978     raise RequestsJSONDecodeError(e.msg, e.doc, e.pos)

JSONDecodeError: Expecting value: line 1 column 1 (char 0)

Expected behavior

Either consistently force to append the correct path, or not.

See this internal thread for context

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant