vLLM – Home Lab – Online notes

VLLM Service

April 11, 2026February 24, 2026 by sandeep

The objective is to start the VLLM service based on QWEN3-VL-8B-FP8; this will be used initially for AI-assisted purchase order processing. Rationale for model selection: NVIDIA L4 GPU (One number) available in the server. Download model hf auth loginhf download Qwen/Qwen3-VL-8B-Instruct-FP8 –local-dir /opt/models/Qwen3-VL-8B-FP8 Create a systemd unit file to start the service (/etc/systemd/system/vllm.service) [Unit]Description=vLLM Qwen … Read more