Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

phi3.5 genai converted model output garbage results with input length around 3000 and 8000. #954

Open
yufang67 opened this issue Oct 3, 2024 · 5 comments

Comments

@yufang67
Copy link

yufang67 commented Oct 3, 2024

Describe the bug
Currently, i use onnxgenai==0.4.0 converted phi_3_5_mini_instruct (fp16 and cuda) and run the infer with onnxgenai on A100 80G.
I observed for some input length around 3000 (8000), i got result length up to the fixed max_length and the results are full of "\n" .

for example, i fixed the max_length is 12K, if the input is 3424 and the output gives 8576 and the output is filled with followings:

n0.\n.\n0.\n.\n.\n0.\n\n\n\n\n\n2.\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n.\n.\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n.\n2.\n\n\n\n\n2.\n2.\n\n\n.\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n.\n\n\n\n\n\n\n.\n\n\n\n\n\n\n\n\n\n\n.\n\n\n\n\n\n.\n0.\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n2.\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n0.

I compared with the transformer API, i didn't get this kind of results with same model.

Any clue of this issue ? (I have seen for vLLM/ transfomers, there exists an issue, https://huggingface.co/microsoft/Phi-3-mini-128k-instruct/discussions/85, vllm-project/vllm#8254 )
Thanks

@natke
Copy link
Contributor

natke commented Oct 3, 2024

Thanks for reporting this. Does it matter which prompt you use? Or any long prompt is producing this output?

@yufang67
Copy link
Author

yufang67 commented Oct 4, 2024

It seems only related to length of the prompt, i got several ones around 3000 length have this issue, like 3824, 3613, 3918... And i have also some samples are correct with 4000 and 5000 length.
Not sure if its related to the issue (link above) where it happens the input length <4096 but input + output >4096. But as in onnx we use max_length, we don't have control of the output length.

@natke
Copy link
Contributor

natke commented Oct 4, 2024

Thank you. Can you share the prompts that produce garbage? The 3000 length and the 8000 length, so that we can repro.

@yufang67
Copy link
Author

yufang67 commented Oct 4, 2024

Sorry, i cant provide the prompt, because its customer data.

@natke
Copy link
Contributor

natke commented Oct 7, 2024

No problem. I did reproduce garbage output for a prompt length of 3402. We are investigating.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants