|
楼主 |
发表于 2025-2-7 00:06
|
显示全部楼层
我靠 我还真有 L0_TO_RECOVERY_COUNTER以外不是0的。
NAKS_SENT: 14. 然后我去问了一下chat** " The NAKS_SENT field in the nvidia-smi output represents the number of "Negative Acknowledgment Packets" sent by the GPU. NAKs are part of the PCIe protocol and are used to signal errors or issues with data transmission. When the GPU encounters a problem with the data it receives, it might send a NAK to indicate that the data was not properly received, or there was some kind of error in the transmission.
The value of NAKS_SENT: 14 indicates that the GPU has encountered some issues with the communication but hasn’t reported any fatal errors or received non-correctable errors. It's not necessarily something to worry about unless the count increases or affects performance.
If you're seeing frequent or increasing NAK values, it could be indicative of issues like:
PCIe signal quality problems.
Potential hardware issues (cabling, slot, or GPU).
Driver or firmware issues.
However, based on your output, it seems to be relatively low, so it doesn't appear to be a serious problem at the moment. If you start seeing more errors or if it affects performance, it could be worth investigating further. "
"
GPU 0: NVIDIA GeForce RTX 5090 D (UUID: GPU-6c329833-c9dc-f353-6657-8493fd25b9f4)
REPLAY_COUNTER: 0
REPLAY_ROLLOVER_COUNTER: 0
L0_TO_RECOVERY_COUNTER: 1213
CORRECTABLE_ERRORS: 0
NAKS_RECEIVED: 0
RECEIVER_ERROR: 0
BAD_TLP: 0
NAKS_SENT: 14
BAD_DLLP: 0
NON_FATAL_ERROR: 0
FATAL_ERROR: 0
UNSUPPORTED_REQ: 0
LCRC_ERROR: 0
LANE_ERROR:
lane 0: 0
lane 1: 0
lane 2: 0
lane 3: 0
lane 4: 0
lane 5: 0
lane 6: 0
lane 7: 0
lane 8: 0
lane 9: 0
lane 10: 0
lane 11: 0
lane 12: 0
" |
|