找回密码
 加入我们
搜索
      
查看: 4423|回复: 9

[存储] nuc9 PVE直通nvme在虚拟机中不可用

[复制链接]
发表于 2024-6-17 05:00 | 显示全部楼层 |阅读模式
本帖最后由 onederfa 于 2024-6-17 15:31 编辑

  • 环境:intel nuc9 BIOS升到了最新
          PVE8.2+i7-9850h+3块4T的MAP 1602主控固态(致钛 7100 *1 + 宏碁GM7 *2)
          内核是6.8.4

  • 问题:PVE三块硬盘都能识别和操作,也可以成功直通nvme到truenas scale虚拟机,虚拟机里也能识别。但添加存储池时,swipe其中一个m2插槽的硬盘就会报IO error 5(3个 硬盘轮流换上去都报错),另外两个硬盘位正常。

  • 猜想:

  • m2插槽有问题。但PVE宿主机三个硬盘都能识别和正常运行,宿主机系统换成Windows也能正常识别并运行。排除
  • truenas虚拟机系统有问题。但尝试把硬盘直通到Windows虚拟机,也是唯独这个插槽上的硬盘用不了。排除
  • map 1602主控的坑。换上了一块1T的SN580还是不行,更换了站里大佬的文章联芸map1602改过的pve内核还是不行。


因此大概率会是PVE系统或者nvme直通的问题,但为什么另外两个硬盘位都正常,唯独这一个能正常直通却在虚拟机里面用不了?就很吊诡。用排除法来来回回试,完全不知道是哪里出了问题,求懂的大佬看看是怎么回事🙏🙏🙏

log里面有
  1. Jun 17 15:20:13 pve kernel: vfio-pci 0000:70:00.0: Unable to change power state from D3cold to D0, device inaccessible
复制代码

尝试BIOS禁用APSM,grub添加pcie_aspm=off nvme_core.default_ps_max_latency_us=0都不行
  1. GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on iommu=pt pcie_aspm=off nvme_core.default_ps_max_latency_us=0"
复制代码


truenas虚拟机报错硬盘的fio测试有io error:
  1. read: (g=0): rw=read, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=1
  2. write: (g=0): rw=write, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=1
  3. randread: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=1
  4. randwrite: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=1
  5. fio-3.33
  6. Starting 4 processes
  7. fio: io_u error on file /dev/nvme2n1: Input/output error: write offset=700571648, buflen=4096
  8. fio: pid=98259, err=5/file:io_u.c:1876, func=io_u error, error=Input/output error
  9. fio: io_u error on file /dev/nvme2n1: Input/output error: write offset=0, buflen=4096
  10. fio: pid=98257, err=5/file:io_u.c:1876, func=io_u error, error=Input/output error
  11. Jobs: 2 (f=2): [R(1),X(1),r(1),X(1)][100.0%][r=216MiB/s][r=55.3k IOPS][eta 00m:00s]
  12. read: (groupid=0, jobs=4): err= 5 (file:io_u.c:1876, func=io_u error, error=Input/output error): pid=98256: Sun Jun 16 23:44:54 2024
  13.   read: IOPS=30.7k, BW=120MiB/s (126MB/s)(7202MiB/60010msec)
  14.     slat (nsec): min=1065, max=146684, avg=2187.12, stdev=1327.62
  15.     clat (nsec): min=452, max=72715k, avg=62330.69, stdev=1329730.42
  16.      lat (usec): min=11, max=72717, avg=64.52, stdev=1329.75
  17.     clat percentiles (usec):
  18.      |  1.00th=[   13],  5.00th=[   14], 10.00th=[   14], 20.00th=[   15],
  19.      | 30.00th=[   15], 40.00th=[   16], 50.00th=[   16], 60.00th=[   17],
  20.      | 70.00th=[   18], 80.00th=[   19], 90.00th=[   27], 95.00th=[   30],
  21.      | 99.00th=[   86], 99.50th=[   98], 99.90th=[30278], 99.95th=[43779],
  22.      | 99.99th=[43779]
  23.    bw (  KiB/s): min=40644, max=229472, per=99.70%, avg=122532.77, stdev=55151.28, samples=238
  24.    iops        : min=10161, max=57368, avg=30633.02, stdev=13787.84, samples=238
  25.   lat (nsec)   : 500=0.01%, 750=0.01%, 1000=0.01%
  26.   lat (usec)   : 2=0.01%, 4=0.01%, 10=0.01%, 20=82.67%, 50=14.97%
  27.   lat (usec)   : 100=1.90%, 250=0.32%, 500=0.01%, 750=0.01%, 1000=0.01%
  28.   lat (msec)   : 2=0.01%, 4=0.01%, 10=0.01%, 20=0.01%, 50=0.11%
  29.   lat (msec)   : 100=0.01%
  30.   cpu          : usr=2.84%, sys=6.81%, ctx=1843437, majf=6, minf=69
  31.   IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
  32.      submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
  33.      complete  : 0=0.1%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
  34.      issued rwts: total=1843826,2,0,0 short=0,0,0,0 dropped=0,0,0,0
  35.      latency   : target=0, window=0, percentile=100.00%, depth=1

  36. Run status group 0 (all jobs):
  37.    READ: bw=120MiB/s (126MB/s), 120MiB/s-120MiB/s (126MB/s-126MB/s), io=7202MiB (7552MB), run=60010-60010msec

  38. Disk stats (read/write):
  39.   nvme2n1: ios=1838134/2, merge=0/0, ticks=109134/525, in_queue=109660, util=100.00%
复制代码
pve主机里同一个硬盘位硬盘的测试正常运行:
  1. read: (g=0): rw=read, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=1
  2. ...
  3. write: (g=0): rw=write, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=1
  4. ...
  5. randread: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=1
  6. ...
  7. randwrite: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=1
  8. ...
  9. fio-3.33
  10. Starting 16 processes
  11. Jobs: 16 (f=16): [R(4),W(4),r(4),w(4)][100.0%][r=275MiB/s,w=1606MiB/s][r=70.5k,w=411k IOPS][eta 00m:00s]
  12. read: (groupid=0, jobs=16): err= 0: pid=2128: Mon Jun 17 00:14:10 2024
  13.   read: IOPS=71.6k, BW=280MiB/s (293MB/s)(16.4GiB/60001msec)
  14.     slat (nsec): min=1053, max=38148, avg=2475.01, stdev=1204.03
  15.     clat (nsec): min=732, max=17269k, avg=108465.12, stdev=189486.51
  16.      lat (usec): min=13, max=17271, avg=110.94, stdev=189.49
  17.     clat percentiles (usec):
  18.      |  1.00th=[   43],  5.00th=[   45], 10.00th=[   46], 20.00th=[   48],
  19.      | 30.00th=[   53], 40.00th=[   62], 50.00th=[   78], 60.00th=[   92],
  20.      | 70.00th=[  104], 80.00th=[  129], 90.00th=[  167], 95.00th=[  202],
  21.      | 99.00th=[  676], 99.50th=[ 1778], 99.90th=[ 2114], 99.95th=[ 2311],
  22.      | 99.99th=[ 4178]
  23.    bw (  KiB/s): min=223096, max=324488, per=100.00%, avg=286252.77, stdev=1723.79, samples=952
  24.    iops        : min=55774, max=81122, avg=71563.19, stdev=430.95, samples=952
  25.   write: IOPS=418k, BW=1634MiB/s (1713MB/s)(95.7GiB/60001msec); 0 zone resets
  26.     slat (nsec): min=973, max=81383, avg=2096.82, stdev=873.04
  27.     clat (nsec): min=492, max=22555k, avg=16296.29, stdev=70921.50
  28.      lat (usec): min=8, max=22557, avg=18.39, stdev=70.93
  29.     clat percentiles (usec):
  30.      |  1.00th=[   10],  5.00th=[   11], 10.00th=[   11], 20.00th=[   12],
  31.      | 30.00th=[   12], 40.00th=[   13], 50.00th=[   13], 60.00th=[   14],
  32.      | 70.00th=[   14], 80.00th=[   15], 90.00th=[   16], 95.00th=[   17],
  33.      | 99.00th=[   37], 99.50th=[   43], 99.90th=[ 1565], 99.95th=[ 1795],
  34.      | 99.99th=[ 2089]
  35.    bw (  MiB/s): min= 1461, max= 1684, per=100.00%, avg=1635.05, stdev= 6.45, samples=952
  36.    iops        : min=374034, max=431212, avg=418572.69, stdev=1652.47, samples=952
  37.   lat (nsec)   : 500=0.01%, 750=0.01%, 1000=0.01%
  38.   lat (usec)   : 2=0.01%, 4=0.01%, 10=2.41%, 20=81.08%, 50=5.24%
  39.   lat (usec)   : 100=6.29%, 250=4.37%, 500=0.27%, 750=0.08%, 1000=0.02%
  40.   lat (msec)   : 2=0.20%, 4=0.04%, 10=0.01%, 20=0.01%, 50=0.01%
  41.   cpu          : usr=6.82%, sys=12.45%, ctx=29393451, majf=0, minf=216
  42.   IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
  43.      submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
  44.      complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
  45.      issued rwts: total=4293850,25100258,0,0 short=0,0,0,0 dropped=0,0,0,0
  46.      latency   : target=0, window=0, percentile=100.00%, depth=1

  47. Run status group 0 (all jobs):
  48.    READ: bw=280MiB/s (293MB/s), 280MiB/s-280MiB/s (293MB/s-293MB/s), io=16.4GiB (17.6GB), run=60001-60001msec
  49.   WRITE: bw=1634MiB/s (1713MB/s), 1634MiB/s-1634MiB/s (1713MB/s-1713MB/s), io=95.7GiB (103GB), run=60001-60001msec

  50. Disk stats (read/write):
  51.   nvme2n1: ios=4285430/25063631, merge=0/0, ticks=449166/338480, in_queue=787646, util=99.91%
复制代码


truenas报错
Snipaste_2024-06-17_15-24-45.png
win10虚拟机格式报错

Windows虚拟机格式化报错

Windows虚拟机格式化报错

3条nvme,计算卡的两个nvme连着南桥,远离cpu的那条不能用

报错的是远离cpu的硬盘位

报错的是远离cpu的硬盘位
发表于 2024-6-17 08:07 | 显示全部楼层
这玩意貌似也走不了售后了。。。
发表于 2024-6-17 08:14 | 显示全部楼层
我用pve7的时候就是直通不了nvme,不知道什么原因,
有些人说可能和主板不支持pcie拆分有关
发表于 2024-6-17 10:13 来自手机 | 显示全部楼层
nvme算了吧
发表于 2024-6-17 11:16 | 显示全部楼层
开了iommu吗
发表于 2024-6-17 12:06 来自手机 | 显示全部楼层
你这单独的一块不会是那个直通cpu的吧,如果是考虑下是否cpu通道问题。那两个可能是因为通过cm246转出来所以没问题?纯属猜测仅供参考。
 楼主| 发表于 2024-6-17 13:32 | 显示全部楼层
本帖最后由 onederfa 于 2024-6-17 13:45 编辑
皛羽控 发表于 2024-6-17 12:06
你这单独的一块不会是那个直通cpu的吧,如果是考虑下是否cpu通道问题。那两个可能是因为通过cm246转出来所 ...


压在主板下面的nvme是直连cpu的可以用,计算卡的两条nvme都连着南桥,倒是其中一条用不了

Snipaste_2024-06-17_13-44-55.png
发表于 2024-6-17 14:45 | 显示全部楼层
  1. #!/bin/bash
  2. for d in /sys/kernel/iommu_groups/*/devices/*; do
  3.   n=${d#*/iommu_groups/*}; n=${n%%/*}
  4.   printf 'IOMMU Group %s ' "$n"
  5.   lspci -nns "${d##*/}"
  6. done
复制代码

看下直通的那个硬盘是不是单独一个iommu分组
 楼主| 发表于 2024-6-17 14:56 | 显示全部楼层
本帖最后由 onederfa 于 2024-6-17 14:58 编辑
Dk2014 发表于 2024-6-17 14:45
看下直通的那个硬盘是不是单独一个iommu分组

  1. IOMMU Group 0 00:02.0 VGA compatible controller [0300]: Intel Corporation CoffeeLake-H GT2 [UHD Graphics 630] [8086:3e9b] (rev 02)
  2. IOMMU Group 10 00:1c.0 PCI bridge [0604]: Intel Corporation Cannon Lake PCH PCI Express Root Port #1 [8086:a338] (rev f0)
  3. IOMMU Group 11 00:1c.4 PCI bridge [0604]: Intel Corporation Cannon Lake PCH PCI Express Root Port #5 [8086:a33c] (rev f0)
  4. IOMMU Group 12 00:1d.0 PCI bridge [0604]: Intel Corporation Cannon Lake PCH PCI Express Root Port #9 [8086:a330] (rev f0)
  5. IOMMU Group 13 00:1d.5 PCI bridge [0604]: Intel Corporation Cannon Lake PCH PCI Express Root Port #14 [8086:a335] (rev f0)
  6. IOMMU Group 14 00:1e.0 Communication controller [0780]: Intel Corporation Cannon Lake PCH Serial IO UART Host Controller [8086:a328] (rev 10)
  7. IOMMU Group 15 00:1f.0 ISA bridge [0601]: Intel Corporation Cannon Lake LPC Controller [8086:a30e] (rev 10)
  8. IOMMU Group 15 00:1f.3 Audio device [0403]: Intel Corporation Cannon Lake PCH cAVS [8086:a348] (rev 10)
  9. IOMMU Group 15 00:1f.4 SMBus [0c05]: Intel Corporation Cannon Lake PCH SMBus Controller [8086:a323] (rev 10)
  10. IOMMU Group 15 00:1f.5 Serial bus controller [0c80]: Intel Corporation Cannon Lake PCH SPI Controller [8086:a324] (rev 10)
  11. IOMMU Group 15 00:1f.6 Ethernet controller [0200]: Intel Corporation Ethernet Connection (7) I219-LM [8086:15bb] (rev 10)
  12. IOMMU Group 16 03:00.0 Non-Volatile memory controller [0108]: MAXIO Technology (Hangzhou) Ltd. NVMe SSD Controller MAP1602 [1e4b:1602] (rev 01)
  13. IOMMU Group 17 04:00.0 Network controller [0280]: Intel Corporation Wi-Fi 6 AX200 [8086:2723] (rev 1a)
  14. IOMMU Group 18 05:00.0 PCI bridge [0604]: Intel Corporation JHL7540 Thunderbolt 3 Bridge [Titan Ridge 4C 2018] [8086:15ea] (rev 06)
  15. IOMMU Group 19 06:00.0 PCI bridge [0604]: Intel Corporation JHL7540 Thunderbolt 3 Bridge [Titan Ridge 4C 2018] [8086:15ea] (rev 06)
  16. IOMMU Group 1 00:00.0 Host bridge [0600]: Intel Corporation 8th Gen Core Processor Host Bridge/DRAM Registers [8086:3ec4] (rev 0d)
  17. IOMMU Group 20 06:01.0 PCI bridge [0604]: Intel Corporation JHL7540 Thunderbolt 3 Bridge [Titan Ridge 4C 2018] [8086:15ea] (rev 06)
  18. IOMMU Group 21 06:02.0 PCI bridge [0604]: Intel Corporation JHL7540 Thunderbolt 3 Bridge [Titan Ridge 4C 2018] [8086:15ea] (rev 06)
  19. IOMMU Group 22 06:04.0 PCI bridge [0604]: Intel Corporation JHL7540 Thunderbolt 3 Bridge [Titan Ridge 4C 2018] [8086:15ea] (rev 06)
  20. IOMMU Group 23 07:00.0 System peripheral [0880]: Intel Corporation JHL7540 Thunderbolt 3 NHI [Titan Ridge 4C 2018] [8086:15eb] (rev 06)
  21. IOMMU Group 24 3b:00.0 USB controller [0c03]: Intel Corporation JHL7540 Thunderbolt 3 USB Controller [Titan Ridge 4C 2018] [8086:15ec] (rev 06)
  22. IOMMU Group 25 70:00.0 Non-Volatile memory controller [0108]: Sandisk Corp WD Blue SN580 NVMe SSD (DRAM-less) [15b7:5041] (rev 01)
  23. IOMMU Group 26 71:00.0 Ethernet controller [0200]: Intel Corporation I210 Gigabit Network Connection [8086:1533] (rev 03)
  24. IOMMU Group 2 00:01.0 PCI bridge [0604]: Intel Corporation 6th-10th Gen Core Processor PCIe Controller (x16) [8086:1901] (rev 0d)
  25. IOMMU Group 2 00:01.2 PCI bridge [0604]: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor PCIe Controller (x4) [8086:1909] (rev 0d)
  26. IOMMU Group 2 02:00.0 Non-Volatile memory controller [0108]: MAXIO Technology (Hangzhou) Ltd. NVMe SSD Controller MAP1602 [1e4b:1602] (rev 01)
  27. IOMMU Group 3 00:08.0 System peripheral [0880]: Intel Corporation Xeon E3-1200 v5/v6 / E3-1500 v5 / 6th/7th/8th Gen Core Processor Gaussian Mixture Model [8086:1911]
  28. IOMMU Group 4 00:12.0 Signal processing controller [1180]: Intel Corporation Cannon Lake PCH Thermal Controller [8086:a379] (rev 10)
  29. IOMMU Group 5 00:14.0 USB controller [0c03]: Intel Corporation Cannon Lake PCH USB 3.1 xHCI Host Controller [8086:a36d] (rev 10)
  30. IOMMU Group 5 00:14.2 RAM memory [0500]: Intel Corporation Cannon Lake PCH Shared SRAM [8086:a36f] (rev 10)
  31. IOMMU Group 6 00:15.0 Serial bus controller [0c80]: Intel Corporation Cannon Lake PCH Serial IO I2C Controller #0 [8086:a368] (rev 10)
  32. IOMMU Group 6 00:15.1 Serial bus controller [0c80]: Intel Corporation Cannon Lake PCH Serial IO I2C Controller #1 [8086:a369] (rev 10)
  33. IOMMU Group 7 00:16.0 Communication controller [0780]: Intel Corporation Cannon Lake PCH HECI Controller [8086:a360] (rev 10)
  34. IOMMU Group 7 00:16.3 Serial controller [0700]: Intel Corporation Cannon Lake PCH Active Management Technology - SOL [8086:a363] (rev 10)
  35. IOMMU Group 8 00:17.0 SATA controller [0106]: Intel Corporation Cannon Lake Mobile PCH SATA AHCI Controller [8086:a353] (rev 10)
  36. IOMMU Group 9 00:1b.0 PCI bridge [0604]: Intel Corporation Cannon Lake PCH PCI Express Root Port #21 [8086:a32c] (rev f0)
复制代码


是的,3块nvme都在单独的组里面,出问题的是IOMMU Group 25
  1. 03:00.0 Non-Volatile memory controller [0108]: MAXIO Technology (Hangzhou) Ltd. NVMe SSD Controller MAP1602 [1e4b:1602] (rev 01)
  2. IOMMU Group 16 03:00.0 Non-Volatile memory controller [0108]: MAXIO Technology (Hangzhou) Ltd. NVMe SSD Controller MAP1602 [1e4b:1602] (rev 01)
  3. 70:00.0 Non-Volatile memory controller [0108]: Sandisk Corp WD Blue SN580 NVMe SSD (DRAM-less) [15b7:5041] (rev 01)
  4. IOMMU Group 25 70:00.0 Non-Volatile memory controller [0108]: Sandisk Corp WD Blue SN580 NVMe SSD (DRAM-less) [15b7:5041] (rev 01)
  5. 02:00.0 Non-Volatile memory controller [0108]: MAXIO Technology (Hangzhou) Ltd. NVMe SSD Controller MAP1602 [1e4b:1602] (rev 01)
  6. IOMMU Group 2 02:00.0 Non-Volatile memory controller [0108]: MAXIO Technology (Hangzhou) Ltd. NVMe SSD Controller MAP1602 [1e4b:1602] (rev 01)
复制代码
发表于 2024-6-17 15:37 | 显示全部楼层
nuc9 i5也遇到了,无法直通nvme,我还以为是nvme主控的问题
您需要登录后才可以回帖 登录 | 加入我们

本版积分规则

Archiver|手机版|小黑屋|Chiphell ( 沪ICP备12027953号-5 )沪公网备310112100042806 上海市互联网违法与不良信息举报中心

GMT+8, 2025-4-27 23:07 , Processed in 0.014619 second(s), 6 queries , Gzip On, Redis On.

Powered by Discuz! X3.5 Licensed

© 2007-2024 Chiphell.com All rights reserved.

快速回复 返回顶部 返回列表