vllm-mlx

vllm-mlx

waybarrios

OpenAI and Anthropic compatible server for Apple Silicon. Run LLMs and vision-language models (Llama, Qwen-VL, LLaVA) with continuous batching, MCP tool calling, and multimodal support. Native MLX backend, 400+ tok/s. Works with Claude Code.

978 Stars
146 Forks
978 Watchers
Python Language
apache-2.0 License
100 SrcLog Score
Cost to Build
$402.2K
Market Value
$2.20M

Growth over time

1 data points  ·  2026-04-25 → 2026-04-25
Stars Forks Watchers
💬

How do you feel about this project?

Ask AI about vllm-mlx

Question copied to clipboard

What is the waybarrios/vllm-mlx GitHub project? Description: "OpenAI and Anthropic compatible server for Apple Silicon. Run LLMs and vision-language models (Llama, Qwen-VL, LLaVA) with continuous batching, MCP tool calling, and multimodal support. Native MLX backend, 400+ tok/s. Works with Claude Code.". Written in Python. Explain what it does, its main use cases, key features, and who would benefit from using it.

Question is copied to clipboard — paste it after the AI opens.

How to clone vllm-mlx

Clone via HTTPS

git clone https://github.com/waybarrios/vllm-mlx.git

Clone via SSH

[email protected]:waybarrios/vllm-mlx.git

Download ZIP

Download master.zip

Found an issue?

Report bugs or request features on the vllm-mlx issue tracker:

Open GitHub Issues