I have recently built a new PC, to be used as a server. For months now, I have been getting unexplained crashes, sometimes after a few minutes, sometimes after a few days, where the PC just reboots without any trace in the logs. Just normal occasional status logs, and then, a few seconds later, the log of a normal boot process.

This is slowly driving me crazy because I just can’t make out the issue. I have tried multiple different Linux installs, swapped out the ssd and PSU and ran a ram test but this behaviour stills persists.

Today something was different. Instead of rebooting, it showed me this blue screen, this time finally with a log. But I still can’t seem to make out the issues. Some quick internet searches show some very vague answers; everything from software to hardware, and psu to CPU.

Can any Linux wizard help me fix my problem? Link to the log

Update: I have now faced an even weirder issue. I booted up, installed cpupower like a comment suggested, installed man to look up its documentation and then the screen froze, and I was forced to reboot the PC by pressing the power button for 3s. Then when I booted back up, my bash history was reset to a state a from a few days back (~.bash_history mod time from 2 days ago) even though I rebooted several times since then, and have not had any persistency errors like this. man was also not installed anymore. Even weirder is that cpupower was still installed. So it seems like some data was saved, while other files were discarded. I will now use a second ssd and try to replicate this. I now suspect some kind of Storage issue, even though the two ssd drives in question have never caused issues in my laptop. This seems scary, I have never witnessed a so weirdly corrupted Linux install, ever.

      • Gyroplast@pawb.social
        link
        fedilink
        English
        arrow-up
        4
        arrow-down
        1
        ·
        2 days ago

        No, it is not. There is an issue with the installed GPU not being supported by the initializing driver, but this is entirely irrelevant for the reported fault and panic happening more than 1600 seconds later.

        Or would you argue the NIC is 100% the issue, because r8169 0000:04:00.0 enp4s0: Link is Down is literally right in the logs?

        • chonkyninja@lemmy.world
          link
          fedilink
          English
          arrow-up
          1
          ·
          1 day ago

          I’ll bet $10 on the Nvidia drivers, OP is running 6.14.4 as am I, and the Nvidia drivers have a whole bunch of issues and require special patches to remove deprecated api calls.

          Also, for this kernel you need the Nvidia Open driver.

          • Gyroplast@pawb.social
            link
            fedilink
            English
            arrow-up
            1
            arrow-down
            1
            ·
            1 day ago

            I’m in. $10 on “this reported kernel panic is not resolved by any change to which nvidia kernel driver is loaded, patched or not, or how anything pertaining nvidia is configured”.

            nvidia is at fault for many issues, agreed, but not this one.