|  | 
 
 楼主|
发表于 2023-12-1 23:29
|
显示全部楼层 
| 我也不是专业的图形学编程工作者,按nvidia开发者论坛官方人员的说法是这样的:
 
 
 Cycles ActivePipeline utilization based on the number of cycles the pipeline was active. This takes the rates of different instructions executing on the pipeline into account. For an instruction requiring 4 cycles to complete execution, the counter is increased by 1 for 4 cycles.
 
 Inst Executed
 Pipeline utilization based on the number of executed instructions. This does not account for any variation in instruction latencies for this pipeline. For an instruction requiring 4 cycles to complete execution, the counter is increased by 1 only.
 
 As you can maybe see from the descriptions, inst_executed only looks at how many instructions are issued, but not at their latencies. If the instruction has non-negligible latency, the metric will never reach 100%. Cycles active on the other hand takes this into account. Seeing both side-by-side is ideal, as it indicates not only how much the pipeline is utilized, but also if it’s utilized by many short and few long instructions.
 我猜大概流程可能是这样: 游戏引擎渲染管线加入DLSS post-processing, DLSS实现逻辑在nvngx_dlss.dll下, 真正游戏运行时执行到后处理时dll中的代码经CPU发送到GPU前端解码后指令发送到后端SM scheduler(nv的架构有warp schedule但我不太懂), 这里的tensor active为4%应该意思就是只有4%的时钟周期里tensor core流水线是活跃的,其余都在空转, 至少这个值应该是跟tensor core利用率极度正相关.
 
 
 nv官方链接: https://forums.developer.nvidia. ... tilization/214795/3
 | 
 |