AMD战未来不是虚的, AI提速最少20%

Leciel 发表于 2025-4-25 11:45

众所周知，amd在ai这块的首要问题还是无法越过nvidia的cuda护城河。
新的算法新的特性首发都是cuda，amd的rocm遥遥无期。

但是，迟到总好过缺席。

一年前flash attention2发布，nvidia用户马上就吃上了。多少提升amd用户也是不知道，能把应用跑起来就不错了。
过了一年再往回看，8个月前有好事者为stable diffusion发布了最小特性包：
https://github.com/Repeerc/flash-attention-v2-RDNA3-minimal
以及comfyUI的适配
https://github.com/Repeerc/ComfyUI-flash-attention-rdna3-win-zluda

过了这八个月，pytorch, comfyui, rocm的进展神速，Repeerc的**时了。

过去三天我把这个repo在最新的软件包基础上重新编译打包了一下，只支持7900xtx：
https://github.com/jiangfeng79/ComfyUI-flash-attention-rdna3-win-zluda

测试结果证明战未来有看头。sdxl 1024 x 1024提速终于过了 4it/s:
got prompt Select optimized attention: sub-quad sub-quad 100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 Prompt executed in 6.59 seconds

got prompt Select optimized attention: Flash-Attention-v2 Flash-Attention-v2 100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 Prompt executed in 5.64 seconds

可惜的是flux跑不起来，目前只适配sd和sdxl。

z010q3w 发表于 2025-4-25 11:57

真不错

页: [1]

Chiphell - 分享与交流用户体验's Archiver

AMD战未来不是虚的, AI提速最少20%