SOLUTION BELOW

The actual bug


I have never been in a more confusing situation regarding Linux.

I have a Dell XPS 15 9560, which had a dual boot Windows 10 / EndeavourOS setup. It was running fine for months. 10 days ago I updated Linux and after restart it couldn’t boot anymore. It got stuck at “A start job is running for /dev/disk/by-uuid/…” (which is the root partition).

First, with the help of a friend of mine who is quite knowledgeable about Linux (he runs vanilla Arch, etc), we spent 5 hours trying to fix it but had no luck.

Then I decided to back up everything and do a fresh install. Aaaand the same error happened again on the first boot. Then I though “ok, probably some problem with Arch, lets try Fedora”. Nope. Some similar error about not finding the root partition. (Here I must say that the kernel which was shipped with the ISO was working fine, but after updating to the latest one, it failed.) Here I thought “ok, then it might be a problem with the latest kernel, let’s install EndeavourOS with the LTS kernel.” Nope, LTS kernel also didn’t boot. Then I tried Ubuntu and it worked, but that’s not solving the problem. Then I decided to put another nvme drive in the laptop and try there. The same error again.

Now the greatest part: If I put the nvme drive into an external usb case, EndeavourOS installs, updates, boots without any problem, no sign of the error.

So now I don’t know how to proceed… Maybe there is something wrong with the pcie port in my laptop, but except for the booting problem, windows is working, I can also mount and access every partition in the ssd through a live usb. So no other signs of problem with the port whatsoever.

I would be grateful for any advice as I’ve lost several days trying to solve this and I am out of ideas…


Solution: The last working kernels are from 11. August 2023 (both linux and linux-lts) linux-6.4.10.arch1-1 and linux-lts-6.1.45-1. You can download them from here: linux / linux-lts and install them with

sudo pacman -U the_path_to_the_package

Thank you all for the help!

  • abrer@lemmy.one
    link
    fedilink
    arrow-up
    10
    arrow-down
    1
    ·
    edit-2
    1 year ago

    So this occurs after an update. Is it not possible to boot into the prior kernel?

    If possible to boot into the prior kernel, can you inspect logs or the journal to see where your error is cropping up?

    This issue sounds like a regression of sorts with a driver, but log/debug would help confirm. This would be one worth reporting to upstream if you can rescue some logs (I gather you can if you can boot the disk from another enclosure).

    If you can boot into the machine, investigate note from the journal:

    • journalctl --list-boots
    • journalctl -b -1,
      • where -1 was the prior boot, -2, the one before that, etc

    – If you are booting into a live environment or are otherwise mounting the disk:

    • journalctl -D /var/log/journal/ID_GOES_HERE
    • example path: /var/log/journal/2dff8304d5114c44bfb1311357a3cd87

    – Keep us posted.

    If truly a driver regression, but you can boot from the prior kernel (if you don’t have it, install it via livecd or so), definitely report this one and remain on the prior kernel until resolved. Bleeding edge things.

    • oiram15@lemmy.sdf.orgOP
      link
      fedilink
      arrow-up
      2
      arrow-down
      1
      ·
      edit-2
      1 year ago

      I have already wiped everything, so no logs… The only way to have it booting is to install EndeavourOS using the offline installer, which is using kernel 6.4.8. There is an option to install the LTS kernel alongside. So the system is booting with 6.4.8, but after updating, neither the new 6.4.12, or the LTS, which is 6.2, doesn’t boot. I haven’t tried booting with the LTS kernel before updating, to see if the same kernel is working before and after or not. I will try to reinstall it using the offline installer and then try to gather some logs after updating.

  • eldain@feddit.nl
    link
    fedilink
    arrow-up
    7
    arrow-down
    1
    ·
    1 year ago

    According to your logs when installing the kernel, your nvme modules are nowhere to be found, and basic system tools are unavailable. Your core system seems severely borked… can you do a memtest (broken ram can corrupt your storage and your attemps at finding the problem) and check your smartctl --all of that ssd?

  • timicin@kbin.social
    link
    fedilink
    arrow-up
    5
    arrow-down
    1
    ·
    1 year ago

    when it gets stuck on something like that, it’s because of a hardware related change; did you update or modify your bios recently or change hardware drivers?

  • setVeryLoud(true);@lemmy.ca
    link
    fedilink
    arrow-up
    5
    arrow-down
    1
    ·
    1 year ago

    Use a live USB to back up your home directory and a list of your packages, then reinstall. I don’t think it’s worth the trouble.

  • rotopenguin@infosec.pub
    link
    fedilink
    English
    arrow-up
    5
    arrow-down
    1
    ·
    1 year ago

    AHAHAHA that is a proper insane bug. One PCIe device shouldn’t be able to slap others off of the “bus”, “we’re not on a bus all you did was mess up your own personal lanes mate”.

  • Parsnip8904@beehaw.org
    link
    fedilink
    arrow-up
    3
    ·
    1 year ago

    That’s a weird issue. Do you have encryption on by any chance? I had a similar error pop-up when I didn’t have the correct systems hooks for the kernel so that after install update of kernel would make the system not boot.

    • oiram15@lemmy.sdf.orgOP
      link
      fedilink
      arrow-up
      1
      arrow-down
      1
      ·
      1 year ago

      Could you please take a look at the comments from abrer and Illecors, I have shares a lot of info there.

        • oiram15@lemmy.sdf.orgOP
          link
          fedilink
          arrow-up
          1
          ·
          1 year ago

          The problem is already solved (At least I found the bug and a temporary solution) At the beginning of the post there is a link to the bug report.

  • Guidonsia@lemmy.ml
    link
    fedilink
    arrow-up
    2
    arrow-down
    1
    ·
    1 year ago

    I had se similar issue but with bluetooth, worked on windows not on linux. Fixed it by resetting the BIOSwith the CMOS battery thing

  • TunaCowboy@lemmy.world
    link
    fedilink
    arrow-up
    6
    arrow-down
    7
    ·
    1 year ago

    the help of a friend of mine who is quite knowledgeable about Linux (he runs vanilla Arch, etc)

    Contrary to popular belief using and maintaining arch is a novice exercise. The btw crowd likes to believe their actions are ‘1337’ ‘minimal’ ‘no bloat’, etc. and they’ve sure got the neofetch receipts to prove it!

    The truth of it is that arch is easy, that’s the whole allure. It is the most convenient distro for experienced users with specific (yet broad) set of needs.

    The btw crowd are simply misled kooks deserving of pity and mercy, but not trust.

    https://en.m.wikipedia.org/wiki/Four_stages_of_competence

    • db2@sopuli.xyz
      link
      fedilink
      arrow-up
      8
      arrow-down
      1
      ·
      1 year ago

      Dual (or triple or quad) booting isn’t the problem, I did it for a long time, until I had a machine that could handle more than one virtual machine while leaving the base OS also usable.

    • Dandroid@sh.itjust.works
      link
      fedilink
      arrow-up
      2
      arrow-down
      1
      ·
      1 year ago

      I always dual boot and have never had an issue in many years. I wonder what I am doing differently.

      • Montagge@kbin.social
        link
        fedilink
        arrow-up
        2
        arrow-down
        1
        ·
        1 year ago

        The only way I could dual boot without Windows breaking things was to unplug the Windows drive, install Linux on another drive, plug the Windows drive back in, and boot to Windows by selecting it in the BIOS.
        Trying to use grub would always have issues eventually after booting into Windows.

        • deong@lemmy.world
          link
          fedilink
          arrow-up
          2
          arrow-down
          1
          ·
          1 year ago

          That’s definitely not the norm. Used to be that installing Windows would wreck Grub, but you just needed to but a rescue disk and reinstall Grub one time to fix it. Most people dual booted for decades without any issue there.

          • Montagge@kbin.social
            link
            fedilink
            arrow-up
            1
            arrow-down
            1
            ·
            1 year ago

            To be fair this would have been decades ago. Ubuntu 8.xx and either XP or 7. It was my first dive into Linux so there’s a good chance I did something wrong.
            I was always able to repair grub when windows would wreck it thankfully.

        • Dandroid@sh.itjust.works
          link
          fedilink
          arrow-up
          1
          arrow-down
          1
          ·
          1 year ago

          My current setup is using the former, but I did the latter for so many years and the only issue I had was the clock would get fucked up after booting Windows. But I agree, it’s much safer to have your Windows SSD physically removed when installing Linux so grub doesn’t get installed on the same SSD as Windows. I wouldn’t do it with all the horror stories I have read. My good experience was probably dumb luck.

        • vector_zero@lemmy.world
          link
          fedilink
          arrow-up
          1
          arrow-down
          1
          ·
          1 year ago

          Couldn’t you use grub on one drive to point to the bootloader on a separate drive? Windows should leave that configuration alone, at least in theory.

          • Montagge@kbin.social
            link
            fedilink
            arrow-up
            1
            arrow-down
            1
            ·
            1 year ago

            Probably, but I just didn’t want to give it the chance. It ended up being a non-issue in the end because I never booted back into Windows.