找回密码
 加入我们
搜索
      
楼主: godspeed66

[显卡] RTX PRO 6000 NVIDIA SLI 功能

[复制链接]
 楼主| 发表于 2025-8-14 16:12 | 显示全部楼层
zhuifeng88 发表于 2025-8-14 13:18
"vllm                              0.10.1.dev166+g04e38500e.d20250729.cu129 /home/vllm02/vllm"

    raise CalledProcessError(retcode, cmd)
      subprocess.CalledProcessError: Command '['cmake', '/home/vllm06/vllm', '-G', 'Ninja', '-DCMAKE_BUILD_TYPE=RelWithDebInfo', '-DVLLM_TARGET_DEVICE=cuda', '-DVLLM_PYTHON_EXECUTABLE=/root/anaconda3/envs/vllm06/bin/python3.12', '-DVLLM_PYTHON_PATH=/root/anaconda3/envs/vllm06/lib/python312.zip:/root/anaconda3/envs/vllm06/lib/python3.12:/root/anaconda3/envs/vllm06/lib/python3.12/lib-dynload:/root/anaconda3/envs/vllm06/lib/python3.12/site-packages:/home/vllm06/transformers/src:/root/anaconda3/envs/vllm06/lib/python3.12/site-packages/setuptools/_vendor', '-DFETCHCONTENT_BASE_DIR=/home/vllm06/vllm/.deps', '-DNVCC_THREADS=1', '-DCMAKE_JOB_POOL_COMPILE:STRING=compile', '-DCMAKE_JOB_POOLS:STRING=compile=32', '-DCMAKE_CUDA_COMPILER=/usr/local/cuda-13.0/bin/nvcc']' returned non-zero exit status 1.
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building editable for vllm
Failed to build vllm
ERROR: Failed to build installable wheels for some pyproject.toml based projects (vllm)



放弃了。等软件生态完善吧

4090 RTX 6000ADA 都没遇到这类问题
发表于 2025-8-14 16:36 | 显示全部楼层
本帖最后由 zhuifeng88 于 2025-8-14 17:00 编辑
godspeed66 发表于 2025-8-14 16:12
raise CalledProcessError(retcode, cmd)
      subprocess.CalledProcessError: Command '['cmake', ...


toolkit不要用cuda13, 用12.8.1

======
实在搞不定建议用docker去处理build算了
```
https://github.com/vllm-project/vllm/blob/main/docker/Dockerfile
```
把里面所有12.0改成12.0a就行了
发表于 2025-8-14 17:32 | 显示全部楼层
godspeed66 发表于 2025-8-14 16:12
raise CalledProcessError(retcode, cmd)
      subprocess.CalledProcessError: Command '['cmake', ...

不要用wsl,Linux下直接装驱动然后拉配好的docker跑,性能和吞吐应该会强老大一截了
 楼主| 发表于 2025-8-14 20:08 | 显示全部楼层
zhuifeng88 发表于 2025-8-14 16:36
toolkit不要用cuda13, 用12.8.1

======

感谢  我试试
 楼主| 发表于 2025-8-14 21:04 | 显示全部楼层
zhuifeng88 发表于 2025-8-14 16:36
toolkit不要用cuda13, 用12.8.1

======

改CUDA有用

FP8 可以了


但FP4报错  vllm serve /home/Qwen3-235B-A22B-Thinking-2507-FP4 --served-model-name Qwen3-235B-A22B-Thinking-2507-FP4 --max-model-len 201000 --tensor-parallel-size 2 --gpu-memory-utilization 0.9

(APIServer pid=15764) INFO 08-14 20:58:03 [api_server.py:1805] vLLM API server version 0.10.1.dev628+g00e3f9da4.d20250814

(APIServer pid=15764)   Value error, Unknown quantization method: . Must be one of ['aqlm', 'awq', 'deepspeedfp', 'tpu_int8', 'fp8', 'ptpc_fp8', 'fbgemm_fp8', 'modelopt', 'modelopt_fp4', 'marlin', 'bitblas', 'gguf', '**q_marlin_24', '**q_marlin', '**q_bitblas', 'awq_marlin', '**q', 'compressed-tensors', 'bitsandbytes', 'qqq', 'hqq', 'experts_int8', 'neuron_quant', 'ipex', 'quark', 'moe_wna16', 'torchao', 'auto-round', 'rtn', 'inc', 'mxfp4']. [type=value_error, input_value=ArgsKwargs((), {'model': ...attention_dtype': None}), input_type=ArgsKwargs]
(APIServer pid=15764)     For further information visit https://errors.pydantic.dev/2.11/v/value_error


发表于 2025-8-14 23:19 | 显示全部楼层
本帖最后由 zhuifeng88 于 2025-8-14 23:32 编辑
godspeed66 发表于 2025-8-14 21:04
改CUDA有用

FP8 可以了


sm120的nvfp4目前只做了dense, 没做MoE, moe急着用可以自己从sm100的抄一份改改, 只是跑起来的话修改量估摸着100行左右吧, 就改一下cutlass部分的模板参数就行

还有你用的nvfp4模型格式不对, vllm透明支持的是compressed tensors格式的和比较新版本optimum-nvidia出的, 老的你得自己处理下
 楼主| 发表于 2025-8-15 10:43 | 显示全部楼层
zhuifeng88 发表于 2025-8-14 23:19
sm120的nvfp4目前只做了dense, 没做MoE, moe急着用可以自己从sm100的抄一份改改, 只是跑起来的话修改量估 ...

万分感谢
您需要登录后才可以回帖 登录 | 加入我们

本版积分规则

Archiver|手机版|小黑屋|Chiphell ( 沪ICP备12027953号-5 )沪公网备310112100042806 上海市互联网违法与不良信息举报中心

GMT+8, 2025-8-18 04:56 , Processed in 0.006948 second(s), 3 queries , Gzip On, Redis On.

Powered by Discuz! X3.5 Licensed

© 2007-2024 Chiphell.com All rights reserved.

快速回复 返回顶部 返回列表