NIC goes dark when Proxmox kernel loads after GPU install (works again if GPU removed)

nemanin@lemmy.world · 5 months ago

NIC goes dark when Proxmox kernel loads after GPU install (works again if GPU removed)

eskimofry@lemm.ee · edit-2 5 months ago

Check what changes in lspci command between not having the GPU connected vs. Having it connected.

I am suspecting that your PCI-E bandwidth is getting exhausted once the kernel activates your GPU.

Edit: Although I could be wrong about this. So makes sense to try passing “nomodeset” to your kernel parameters and see if that changes anything.

nemanin@lemmy.world · 5 months ago

Ok. I’ll check it out.

Let’s say it is exhausted… what will get me more bandwidth? CPU or mobo…?

Only other pci-e card in at the moment is 16 line HBA seems to be basically 2 cards sandwiched on one board)…

SzethFriendOfNimi@lemmy.world · edit-2 5 months ago

Possible something on your motherboard has PCIe lanes that are dedicated to GPU when it’s slotted, otherwise they can be used for other devices?

For example here’s a post about m.2 slots that, when used, affect the PCI on a particular board. May be worth checking your boards manual to see if there’s something similar.

https://forums.tomshardware.com/threads/questions-about-a-mb-im-looking-at-asrock-z790-pg-riptide.3787003/

The answer not only seemed a HUGE disappointment, but a bit baffling. The pdf manual says if you occupy that 5th m.2 slot, which is the Gen 5 one, the Pci-E 1 slot is automatically downgraded to 8x. This I thought would be unacceptable if running a behemoth like the RTX 4090 I eventually plan to get, as it requires a lot of power and bandwidth.

Possibly linux@lemmy.zip · 5 months ago

Try disconnecting everything including the extra board.

nemanin@lemmy.world · 5 months ago

It’s late. I’ll have to pull the card and re run tomorrow. But here’s with the GPU in:

It’s an i7-14700 and an ASRock z690 extreme. I’m actually hoping to put a second GPU in the last PCIe slot so I can let proxmox use the iGPU, pass the 3060 into a Unix moonlight gaming VM, and pass an RX590 into a hackintosh VM.

Sabata@ani.social · 5 months ago

I had an issue with an ASrock Tiachi where if I enabled virtualization, the network would disappear entirely. May want to check for FW updates for your board. I had nothing but issues with the shitty BIOS and even had to upgrade my CPU sooner than I wanted to do the update.

Make sure your CPU is still supported by the update.

Possibly linux@lemmy.zip · 5 months ago

Aren’t the PCIe lanes directly connected to the CPU? So the connections would be rerouted in hardware to connect to the GPU?

I am not the poster but I am curious if you know what maybe happening on a hardware level.

mbfalzar@lemmy.dbzer0.com · 5 months ago

There’s generally one or two slots connected directly to the CPU running in x16 or x8 if there’s two and both are connected, 4 lanes linking the CPU to the chipset, and the rest of the slots connect to the chipset and share that same x4 link. If your cpu has 24 lanes (Ryzen do/did a few years ago, Intel might but didn’t a few years ago), the remaining 4 lanes usually go to an NVMe slot

barsquid@lemmy.world · 5 months ago

I had a stock Debian install actually rename the device for my NIC when I changed GPUs. You should double-check if your NIC has the same entry in /dev with and without the GPU. After I changed the name in some config files the NIC worked fine with the GPU in, it could be easy as that.

barsquid@lemmy.world · 5 months ago

I read through your screenshot. The ip command has enp3s0 and the config has enp2s0, I think this might be it.

nemanin@lemmy.world · 5 months ago

Ohhh. In that last line. I wasn’t even looking at that, I assumed the block above that was setting up the primary NIC…

I’ll see if changing that interface name does it…

🐍🩶🐢@lemmy.world · 5 months ago

I changed my settings to name nic cards by mac address instead of the enumeration as I got sick of the name changing when I would add/remove pci devices.

nemanin@lemmy.world · 5 months ago

How do you do this? Idk what to even google, exactly.

🐍🩶🐢@lemmy.world · 5 months ago

I am not at home, but what I did was change the 99-default.link file. I found this from the two pages below. https://wiki.debian.org/NetworkInterfaceNames#CUSTOM_SCHEMES_USING_.LINK_FILES https://wiki.debian.org/NetworkInterfaceNames

Basically, by doing this, your nic cards will be forcibly named using the mac address:

#/etc/systemd/network/99-default.link
 [Match]
 OriginalName=*

 [Link]
 NamePolicy=mac
 MACAddressPolicy=persistent

Afterwards, you will need to reboot and then update your network config file to use the correct names. I don’t ever change the network config with the GUI in proxmox as it has wrecked it too many times. I will update this reply again later with some more information on what to do.

🐍🩶🐢@lemmy.world · edit-2 5 months ago

Sorry, didn’t make it home until today and not sure if you get notifications on edits. You will need a monitor and keyboard hooked up to your server as you will not have ssh access until the network config is “fixed”. I would do the below with the GPU removed, so you know 100% that your networking config is correct before mucking about further.

Step 1 - Create 99-default.link file

Add a /etc/systemd/network/99-default.link with the below contents.

# SPDX-License-Identifier: MIT-0
 #
 # This config file is installed as part of systemd.
 # It may be freely copied and edited (following the MIT No Attribution license).
 #
 # To make local modifications, one of the following methods may be used:
 # 1. add a drop-in file that extends this file by creating the
 #    /etc/systemd/network/99-default.link.d/ directory and creating a
 #    new .conf file there.
 # 2. copy this file into /etc/systemd/network or one of the other paths checked
 #    by systemd-udevd and edit it there.
 # This file should not be edited in place, because it'll be overwritten on upgrades.

 [Match]
 OriginalName=*

 [Link]
 NamePolicy=mac
 MACAddressPolicy=persistent

Step 2 - Reboot and find new name of NIC that will be based on MAC

I forget if you have to reboot, but I am going to assume so. At this point, you can get the new name of your nic card and fix your network config.

ip link should list all of your nic devices, both real and virtual. Here is how mine looks like for reference, with the MAC obfuscated:

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: enxAABBCCDDEEFF: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master vmbr0 state UP mode DEFAULT group default qlen 1000
    link/ether AA:BB:CC:DD:EE:FF brd ff:ff:ff:ff:ff:ff
3: vmbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether AA:BB:CC:DD:EE:FF brd ff:ff:ff:ff:ff:ff

Step 3 - Fix your network config and restart network manager

You will need to edit your /etc/network/interfaces file so the correct card is used.

Make a copy of /etc/network/interfaces, just in case you mess something up.
sudo vim /etc/network/interfaces (or whatever text editor makes you happy) It will need to look something like below. I have to have DHCP turned on for mine, so your config likely uses static. Really all you need to do is change wherever it says enp yada yada to the enxAABBCCDDEEFF you identified above.

 source /etc/network/interfaces.d/*

 auto lo
 iface lo inet loopback

 iface enxAABBCCDDEEFF inet manual

 auto vmbr0
 iface vmbr0 inet dhcp
 #iface vmbr0 inet static
 #address 192.168.5.100/20
 #gateway 192.168.0.1
     bridge-ports enxAABBCCDDEEFF
     bridge-stp off
     bridge-fd 0

Restart your networking service. You shouldn’t need to reboot. sudo systemctl restart networking.service

Step 4 - Profit?

Hopefully at this point you have nework access again. Check the below, do some ping tests, and if it doesn’t work, double check that you edited the interfaces file correctly.

sudo systemctl status networking.service will show you if anything went wrong and hopefully show that everything is working correctly
ip -br addr show should show that the interface is up now.

lo               UNKNOWN        127.0.0.1/8 ::1/128
enxAABBCCDDEEFF  UP
vmbr0            UP             192.168.5.100/20

At this point, if all is well, I would reboot anyways, just to make sure. If you add any GPUs, sata drives, other PCI device, disable/enable wifi/bt in the BIOS, or anything else that changes the PCI numbering, you don’t have to worry about your NIC changing.

nemanin@lemmy.world · 5 months ago

Will give this a go! thanks!

Possibly linux@lemmy.zip · 5 months ago

Like others have said you may be running out of PCIe lanes. If that isn’t the problem and this is a software bug you could try blocklisting the GPU kernel module.

listless@lemmy.cringecollective.io · 5 months ago

check lsmod before and after see what kernel modules are changing.

also look at dmesg for interesting kernel messages as you attempt to use / not use the offending hardware.

nemanin@lemmy.world · 5 months ago

I have no experience with dmesg and also don’t know how to scroll the history since I’m not on a terminal app (since I can’t get the NIC up).

Anything here helpful?

listless@lemmy.cringecollective.io · 5 months ago

dmesg | less should allow you to scroll the output. You should use forward slash in less to search for the devices (hit enter), see if the modules are being loaded or if there some errors.

nemanin@lemmy.world · edit-2 5 months ago

So in the end, the intel 1g NIC just worked, so I gave up for now on trying to get the 2.5g Broadcom working instead.

I might try to link aggregate later and use the 2.5g Broadcom and circle back on this… but we’ll see.

I also got the second GPU installed and it shows up, too. But it’s an Rx590 and is showing as an RTX2070… so I’ll be making another post shortly!

Thanks for all the input!

hungover_pilot@lemmy.world · 5 months ago

Is the NIC built into the motherboard or an add on pcie card?

You could check the journal to see if the logs tell you anything.

nemanin@lemmy.world · 5 months ago

ASRock Z90 Extreme.

Has 2 built in NICs. The intel 1g and the Broadcom 2.5g.

Trying to use the Broadcom here, though my Ethernet is only 1g house wide, so I could try the intel if that seems like it could help…

subignition@fedia.io · 5 months ago

I don’t think you’ll benefit at all from using the 2.5Gbit port if you only have 1Gbit cables, so there’s no downside

Daughter3546@lemmy.world · 5 months ago

I am pretty sure 2.5GBe will work over normal/standard CAT5 and Cat5e cable. It’s the upstream switch that you’ll need to replace.

Decronym@lemmy.decronym.xyz · edit-2 5 months ago

Acronyms, initialisms, abbreviations, contractions, and other phrases which expand to something larger, that I’ve seen in this thread:

Fewer Letters	More Letters
DHCP	Dynamic Host Configuration Protocol, automates assignment of IPs when connecting to a network
NVMe	Non-Volatile Memory Express interface for mass storage
PCIe	Peripheral Component Interconnect Express

3 acronyms in this thread; the most compressed thread commented on today has 4 acronyms.

[Thread #810 for this sub, first seen 16th Jun 2024, 18:05] [FAQ] [Full list] [Contact] [Source code]

Lemmy Tagginator@utter.online · 5 months ago

deleted by creator