|
楼主 |
发表于 2023-12-1 23:29
|
显示全部楼层
我也不是专业的图形学编程工作者,按nvidia开发者论坛官方人员的说法是这样的:
Cycles Active
Pipeline utilization based on the number of cycles the pipeline was active. This takes the rates of different instructions executing on the pipeline into account. For an instruction requiring 4 cycles to complete execution, the counter is increased by 1 for 4 cycles.
Inst Executed
Pipeline utilization based on the number of executed instructions. This does not account for any variation in instruction latencies for this pipeline. For an instruction requiring 4 cycles to complete execution, the counter is increased by 1 only.
As you can maybe see from the descriptions, inst_executed only looks at how many instructions are issued, but not at their latencies. If the instruction has non-negligible latency, the metric will never reach 100%. Cycles active on the other hand takes this into account. Seeing both side-by-side is ideal, as it indicates not only how much the pipeline is utilized, but also if it’s utilized by many short and few long instructions.
我猜大概流程可能是这样: 游戏引擎渲染管线加入DLSS post-processing, DLSS实现逻辑在nvngx_dlss.dll下, 真正游戏运行时执行到后处理时dll中的代码经CPU发送到GPU前端解码后指令发送到后端SM scheduler(nv的架构有warp schedule但我不太懂), 这里的tensor active为4%应该意思就是只有4%的时钟周期里tensor core流水线是活跃的,其余都在空转, 至少这个值应该是跟tensor core利用率极度正相关.
nv官方链接: https://forums.developer.nvidia. ... tilization/214795/3 |
|