vllm-project / vllm Public

Notifications You must be signed in to change notification settings
Fork 4.1k
Star 28k

Code
Issues 1.7k
Pull requests 419
Discussions
Actions
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Security
Insights

Issues: vllm-project/vllm

[Roadmap] vLLM Roadmap Q4 2024

#9006 opened Oct 1, 2024 by simon-mo

Open 2

vLLM's V2 Engine Architecture

#8779 opened Sep 24, 2024 by simon-mo

Open 6

Labels 49 Milestones 0

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

1,653 Open 3,067 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Issues list

[Usage]: How to save log to local path and control log rate? usage

How to use vllm

#9146 opened Oct 8, 2024 by lzcchl

1 task done

[Bug]: vllm much slower on long context inputs when using --enable-lora even when lora is not used bug

Something isn't working

#9143 opened Oct 8, 2024 by badrjd

1 task done

[Bug]: free_seq is invoked multiple times unnecessarily when one request is finished. bug

Something isn't working

#9142 opened Oct 8, 2024 by tongping

[Bug]: out-of-bound in attention.cu bug

Something isn't working

#9136 opened Oct 7, 2024 by molly-ting

1 task done

Issue Running Inference on a Model with Multi Nodes and Inference usage

How to use vllm

#9134 opened Oct 7, 2024 by kogans1107

1 task done

[Bug]: Unable to use --enable-lora on latest vllm docker container (v0.6.2) bug

Something isn't working

#9133 opened Oct 7, 2024 by noelo

1 task done

[Misc]: CMake Clean-up / Refactor Tasks misc

#9129 opened Oct 7, 2024 by LucasWilkinson

6 tasks

[Bug]: assert len(indices) == len(inputs) with Qwen/Qwen2-VL-2B-Instruct bug

Something isn't working

#9128 opened Oct 7, 2024 by sayakpaul

1 task done

[Bug]: Error Encountered in vLLM Benchmarking with Input Length greater than 8192 in Llama 3.1 405B Model bug

Something isn't working

#9127 opened Oct 7, 2024 by Bihan

1 task done

[Usage]: Not getting the infrence metrics in the api response usage

How to use vllm

#9126 opened Oct 7, 2024 by vverma01232

1 task done

[Bug] BlockSpaceManagerV1.get_common_computed_block_ids returns empty string, causing msgspec decode failure bug

Something isn't working

#9122 opened Oct 7, 2024 by amberOoO

1 task done

[Bug]: Unsupported base layer: QKVParallelLinear when loading lora to a quantized model bug

Something isn't working

#9120 opened Oct 7, 2024 by fahadh4ilyas

1 task done

[Bug]: Installation from last commit (version wrong) bug

Something isn't working

#9119 opened Oct 7, 2024 by johnnynunez

1 task done

[Bug]: Issue Running VLLM Open AI using nonroot user in K8s bug

Something isn't working

#9118 opened Oct 7, 2024 by luhurfth

1 task done

[Bug] In v0.6.2, when tp=1, TPOT becomes very slow for batch sizes of 10 or so. (not happened in v0.5.5) bug

Something isn't working

#9113 opened Oct 7, 2024 by ashgold

1 task done

[Feature]: Support for ONNX feature request

#9112 opened Oct 7, 2024 by LetianLee

[Feature]: LLMEngine and ModelConfig explicitly require path or HF model id, but no InferenceClient class for locally running VLLM server feature request

#9110 opened Oct 6, 2024 by DanielViglione

1 task done

[RFC]: hide continuous batching complexity through forward context RFC

#9098 opened Oct 5, 2024 by youkaichao

1 task done

[Bug]: vllm serve Exception in ASGI application bug

Something isn't working

#9096 opened Oct 5, 2024 by SpaceHunterInf

1 task done

[Bug]: VLLM Model Fails on Kubernetes with "CUDA error: operation not permitted when stream is capturing" bug

Something isn't working

#9094 opened Oct 5, 2024 by CREESTL

1 task done

[Installation]: cannot install vllm with openvino backend installation

Installation problems

#9092 opened Oct 5, 2024 by guanxiang

1 task done

[Bug]: vLLM MQLLMEngine Timeout - Json Schema bug

Something isn't working

#9082 opened Oct 4, 2024 by wrisigo

1 task done

[Doc]: Clear documentation about function / tool calling with examples documentation

Improvements or additions to documentation

#9074 opened Oct 4, 2024 by greg2705

1 task done

[Misc]: Need to understand support for torch.compile in Q4 roadmap misc

#9072 opened Oct 4, 2024 by amd-abhikulk

1 task done

[Usage]: Benchmarking Issues: Low Success Rate and Tensor Parallel Size Constraints on 8x AMD MI300x GPUs rocm usage

How to use vllm

#9070 opened Oct 4, 2024 by Bihan

1 task done

Previous 1 2 3 4 5 … 66 67 Next

Previous Next

ProTip! Mix and match filters to narrow down what you’re looking for.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly