| 
 | 
 
 楼主 |
发表于 2025-8-14 10:12
|
显示全部楼层
 
 
 
 本帖最后由 godspeed66 于 2025-8-14 10:15 编辑  
 
 
只能跑 Qwen3-235B-A22B-Thinking-2507-AWQ; 
没开启SLI之前,只有25~35t/s的速度,开启SLI之后45~60t/s的速度 
 
 
 
RTX PRO 6000不支持现有FP8:block scaled fp8 gemm is not implemented for Blackwell; 
 
跑Qwen3-235B FP8 需要240GB显存才能启动,需要300G显存才能有上下文长度。192GB跑不起来 
 
(APIServer pid=897) INFO 08-14 10:12:26 [async_llm.py:273] Added request chatcmpl-c6c39093434d4044875d5eb0f507e969. 
(APIServer pid=897) INFO 08-14 10:12:34 [loggers.py:123] Engine 000: Avg prompt throughput: 225.5 tokens/s, Avg generation throughput: 34.4 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 1.1%, Prefix cache hit rate: 10.3% 
(APIServer pid=897) INFO 08-14 10:12:44 [loggers.py:123] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 47.6 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 1.3%, Prefix cache hit rate: 10.3% 
(APIServer pid=897) INFO 08-14 10:12:54 [loggers.py:123] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 58.2 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 1.5%, Prefix cache hit rate: 10.3% 
(APIServer pid=897) INFO 08-14 10:13:04 [loggers.py:123] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 54.8 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 1.7%, Prefix cache hit rate: 10.3% 
(APIServer pid=897) INFO 08-14 10:13:14 [loggers.py:123] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 46.3 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 1.9%, Prefix cache hit rate: 10.3% 
(APIServer pid=897) INFO 08-14 10:13:24 [loggers.py:123] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 53.5 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 2.1%, Prefix cache hit rate: 10.3% 
(APIServer pid=897) INFO 08-14 10:13:34 [loggers.py:123] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 54.1 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 2.3%, Prefix cache hit rate: 10.3% 
(APIServer pid=897) INFO 08-14 10:13:44 [loggers.py:123] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 53.6 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 2.5%, Prefix cache hit rate: 10.3% |   
 
 
 
 |