|
楼主 |
发表于 2025-8-14 10:12
|
显示全部楼层
本帖最后由 godspeed66 于 2025-8-14 10:15 编辑
只能跑 Qwen3-235B-A22B-Thinking-2507-AWQ;
没开启SLI之前,只有25~35t/s的速度,开启SLI之后45~60t/s的速度
RTX PRO 6000不支持现有FP8:block scaled fp8 gemm is not implemented for Blackwell;
跑Qwen3-235B FP8 需要240GB显存才能启动,需要300G显存才能有上下文长度。192GB跑不起来
(APIServer pid=897) INFO 08-14 10:12:26 [async_llm.py:273] Added request chatcmpl-c6c39093434d4044875d5eb0f507e969.
(APIServer pid=897) INFO 08-14 10:12:34 [loggers.py:123] Engine 000: Avg prompt throughput: 225.5 tokens/s, Avg generation throughput: 34.4 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 1.1%, Prefix cache hit rate: 10.3%
(APIServer pid=897) INFO 08-14 10:12:44 [loggers.py:123] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 47.6 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 1.3%, Prefix cache hit rate: 10.3%
(APIServer pid=897) INFO 08-14 10:12:54 [loggers.py:123] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 58.2 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 1.5%, Prefix cache hit rate: 10.3%
(APIServer pid=897) INFO 08-14 10:13:04 [loggers.py:123] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 54.8 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 1.7%, Prefix cache hit rate: 10.3%
(APIServer pid=897) INFO 08-14 10:13:14 [loggers.py:123] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 46.3 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 1.9%, Prefix cache hit rate: 10.3%
(APIServer pid=897) INFO 08-14 10:13:24 [loggers.py:123] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 53.5 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 2.1%, Prefix cache hit rate: 10.3%
(APIServer pid=897) INFO 08-14 10:13:34 [loggers.py:123] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 54.1 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 2.3%, Prefix cache hit rate: 10.3%
(APIServer pid=897) INFO 08-14 10:13:44 [loggers.py:123] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 53.6 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 2.5%, Prefix cache hit rate: 10.3% |
|