一个开源的算力和显存带宽测试工具
本帖最后由 privater 于 2025-6-8 17:36 编辑https://github.com/ProjectPhysX/OpenCL-Benchmark
直接下载页面:https://github.com/ProjectPhysX/OpenCL-Benchmark/releases/tag/v1.8
点开即用,几秒的时间可以快速测试下显存带宽,不过下面的 PCIe 带宽测试似乎不太准
pro 6000 [流泪] 有空试试。
本贴其实是炫耀贴[偷笑] 不是exe啊,杂用呢 gaoyi124 发表于 2025-6-8 17:20
不是exe啊,杂用呢
https://github.com/ProjectPhysX/OpenCL-Benchmark/releases/tag/v1.8 如果显存有暗病,是不是也能测出来? 感谢分享! archxm 发表于 2025-6-8 17:28
如果显存有暗病,是不是也能测出来?
不能,测显存建议这个开源工具,也很简单 :
https://github.com/GpuZelenograd/memtest_vulkan 多谢分享,晚点试试看 -----------------------------------------------------------------------------.
|----------------.------------------------------------------------------------|
| Device ID 0 | AMD Radeon RX 9070 XT |
| Device ID 1 | AMD Radeon(TM) Graphics |
|----------------'------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID | 0 |
| Device Name | AMD Radeon RX 9070 XT |
| Device Vendor| Advanced Micro Devices, Inc. |
| Device Driver| 3652.0 (PAL,LC) (Windows) |
| OpenCL Version | OpenCL C 2.0 |
| Compute Units| 32 at 2400 MHz (2048 cores, 9.830 TFLOPs/s) |
| Memory, Cache| 16304 MB VRAM, 16 KB global / 64 KB local |
| Buffer Limits| 16304 MB global, 16695296 KB constant |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled. |
| FP64compute 0.821 TFLOPs/s (1/12) |
| FP32compute 24.636 TFLOPs/s ( 2x ) |
| FP16compute 47.295 TFLOPs/s ( 4x ) |
| INT64 compute 3.121TIOPs/s (1/3 ) |
| INT32 compute 5.206TIOPs/s (1/2 ) |
| INT16 compute 22.734TIOPs/s ( 2x ) |
| INT8compute 10.408TIOPs/s ( 1x ) |
| Memory Bandwidth ( coalesced read ) 590.97 GB/s |
| Memory Bandwidth ( coalesced write) 610.22 GB/s |
| Memory Bandwidth (misaligned read ) 588.92 GB/s |
| Memory Bandwidth (misaligned write) 610.60 GB/s |
| PCIe Bandwidth (send ) 28.00 GB/s |
| PCIe Bandwidth ( receive ) 28.61 GB/s |
| PCIe Bandwidth ( bidirectional) (Gen4 x16) 28.46 GB/s |
|-----------------------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID | 1 |
| Device Name | AMD Radeon(TM) Graphics |
| Device Vendor| Advanced Micro Devices, Inc. |
| Device Driver| 3652.0 (PAL,LC) (Windows) |
| OpenCL Version | OpenCL C 2.0 |
| Compute Units| 1 at 2200 MHz (128 cores, 0.563 TFLOPs/s) |
| Memory, Cache| 25721 MB RAM, 16 KB global / 64 KB local |
| Buffer Limits| 23673 MB global, 24241584 KB constant |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled. |
| FP64compute 0.035 TFLOPs/s (1/16) |
| FP32compute 0.557 TFLOPs/s ( 1x ) |
| Benchmarking ... |
.-----------------------------------------------------------------------------.
|----------------.------------------------------------------------------------|
| Device ID 0 | Intel(R) UHD Graphics 770 |
| Device ID 1 | NVIDIA GeForce RTX 3070 Laptop GPU |
|----------------'------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID | 0 |
| Device Name | Intel(R) UHD Graphics 770 |
| Device Vendor| Intel(R) Corporation |
| Device Driver| 32.0.101.6325 (Windows) |
| OpenCL Version | OpenCL C 3.0 |
| Compute Units| 32 at 1550 MHz (256 cores, 0.794 TFLOPs/s) |
| Memory, Cache| 60999 MB RAM, 1920 KB global / 64 KB local |
| Buffer Limits| 4095 MB global, 4194296 KB constant |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled. |
| FP64compute not supported |
| FP32compute 0.664 TFLOPs/s ( 1x ) |
| FP16compute 1.229 TFLOPs/s ( 2x ) |
| INT64 compute 0.058TIOPs/s (1/12) |
| INT32 compute 0.239TIOPs/s (1/3 ) |
| INT16 compute 2.776TIOPs/s ( 4x ) |
| INT8compute 2.809TIOPs/s ( 4x ) |
| Memory Bandwidth ( coalesced read ) 39.26 GB/s |
| Memory Bandwidth ( coalesced write) 36.18 GB/s |
| Memory Bandwidth (misaligned read ) 32.13 GB/s |
| Memory Bandwidth (misaligned write) 17.96 GB/s |
|-----------------------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID | 1 |
| Device Name | NVIDIA GeForce RTX 3070 Laptop GPU |
| Device Vendor| NVIDIA Corporation |
| Device Driver| 528.24 (Windows) |
| OpenCL Version | OpenCL C 3.0 |
| Compute Units| 40 at 1560 MHz (5120 cores, 15.974 TFLOPs/s) |
| Memory, Cache| 16383 MB VRAM, 1120 KB global / 48 KB local |
| Buffer Limits| 4095 MB global, 64 KB constant |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled. |
| FP64compute 0.305 TFLOPs/s (1/64) |
| FP32compute 18.509 TFLOPs/s ( 1x ) |
| FP16compute 19.447 TFLOPs/s ( 1x ) |
| INT64 compute 2.742TIOPs/s (1/8 ) |
| INT32 compute 9.826TIOPs/s (2/3 ) |
| INT16 compute 8.364TIOPs/s (1/2 ) |
| INT8compute 35.291TIOPs/s ( 2x ) |
| Memory Bandwidth ( coalesced read ) 424.47 GB/s |
| Memory Bandwidth ( coalesced write) 430.51 GB/s |
| Memory Bandwidth (misaligned read ) 424.67 GB/s |
| Memory Bandwidth (misaligned write) 103.33 GB/s |
| PCIe Bandwidth (send ) 9.73 GB/s |
| PCIe Bandwidth ( receive ) 7.15 GB/s |
| PCIe Bandwidth ( bidirectional) (Gen4 x16) 8.15 GB/s |
|-----------------------------------------------------------------------------|
|-----------------------------------------------------------------------------| 这区别不是很大,一个3070,一个5060又行了[吐槽]
.-----------------------------------------------------------------------------.
|----------------.------------------------------------------------------------|
| Device ID 0 | NVIDIA GeForce RTX 5070 Ti |
| Device ID 1 | NVIDIA GeForce RTX 5070 Ti |
| Device ID 2 | Microsoft Basic Render Driver |
|----------------'------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID | 0 |
| Device Name | NVIDIA GeForce RTX 5070 Ti |
| Device Vendor| NVIDIA Corporation |
| Device Driver| 576.40 (Windows) |
| OpenCL Version | OpenCL C 3.0 |
| Compute Units| 70 at 2467 MHz (8960 cores, 44.209 TFLOPs/s) |
| Memory, Cache| 16302 MB VRAM, 2240 KB global / 48 KB local |
| Buffer Limits| 4075 MB global, 64 KB constant |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled. |
| FP64compute 0.857 TFLOPs/s (1/64) |
| FP32compute 52.434 TFLOPs/s ( 1x ) |
| FP16compute 54.609 TFLOPs/s ( 1x ) |
| INT64 compute 4.201TIOPs/s (1/12) |
| INT32 compute 27.350TIOPs/s (2/3 ) |
| INT16 compute 24.283TIOPs/s (1/2 ) |
| INT8compute 99.825TIOPs/s ( 2x ) |
| Memory Bandwidth ( coalesced read ) 819.18 GB/s |
| Memory Bandwidth ( coalesced write) 851.16 GB/s |
| Memory Bandwidth (misaligned read ) 890.07 GB/s |
| Memory Bandwidth (misaligned write) 217.68 GB/s |
| PCIe Bandwidth (send ) 14.85 GB/s |
| PCIe Bandwidth ( receive ) 14.47 GB/s |
| PCIe Bandwidth ( bidirectional) (Gen4 x16) 14.65 GB/s |
|-----------------------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID | 1 |
| Device Name | NVIDIA GeForce RTX 5070 Ti |
| Device Vendor| Microsoft |
| Device Driver| 1.1.0 (Windows) |
| OpenCL Version | OpenCL C 3.0 |
| Compute Units| 1 at 12 MHz (128 cores, 0.003 TFLOPs/s) |
| Memory, Cache| 15907 MB VRAM, 0 KB global / 32 KB local |
| Buffer Limits| 1024 MB global, 64 KB constant |
|----------------'------------------------------------------------------------|
PCIE确实不太准。另一个是NV的工具
musl,卒 Mac 也能跑,分数差了点。
.-----------------------------------------------------------------------------.
|----------------.------------------------------------------------------------|
| Device ID 0 | Apple M2 Pro |
|----------------'------------------------------------------------------------|
|----------------.------------------------------------------------------------|
| Device ID | 0 |
| Device Name | Apple M2 Pro |
| Device Vendor| Apple |
| Device Driver| 1.2 1.0 (macOS) |
| OpenCL Version | OpenCL C 1.2 |
| Compute Units| 16 at 1000 MHz (2048 cores, 4.096 TFLOPs/s) |
| Memory, Cache| 10922 MB RAM, 0 KB global / 32 KB local |
| Buffer Limits| 2048 MB global, 1048576 KB constant |
|----------------'------------------------------------------------------------|
| Info: OpenCL C code successfully compiled. |
| FP64compute not supported |
| FP32compute 1.342 TFLOPs/s (1/3 ) |
| FP16compute not supported |
| INT64 compute 1.031TIOPs/s (1/4 ) |
| INT32 compute 1.314TIOPs/s (1/3 ) |
| INT16 compute 1.406TIOPs/s (1/3 ) |
| INT8compute 0.316TIOPs/s (1/12) |
| Memory Bandwidth ( coalesced read ) 185.87 GB/s |
| Memory Bandwidth ( coalesced write) 194.18 GB/s |
| Memory Bandwidth (misaligned read ) 189.04 GB/s |
| Memory Bandwidth (misaligned write) 198.22 GB/s |
|-----------------------------------------------------------------------------|
'-----------------------------------------------------------------------------'
鉴定:炫耀贴,哥有PRO6000,5090!我说得不是你,而是在座的都是乐色。
哈哈,我只是开个玩笑。
页:
[1]