Hey everyone. If you want to post links or discuss the Reddit blackout, please localize it to this thread in order to keep things tidy!

  • randomposter@sh.itjust.works
    link
    fedilink
    arrow-up
    39
    ·
    2 years ago

    As an engineer, this sounds most plausible - they had proactive detection and resolution in place against various attacks and system failures, which got triggered due to the massive drop in public subreddits/users/activity, and made everything worse. Honestly, this isn’t a scenario their engineers could have easily predicted…

    • mizmoose@beehaw.org
      link
      fedilink
      arrow-up
      47
      ·
      edit-2
      2 years ago

      As a former sysadmin and a [still, for the moment] reddit moderator, my bet is that most of the subreddits that switched to private forgot to (or didn’t know to) go into “new reddit” and switch off the thing that allows people to request being added to the now-private subreddit.

      A HUGE influx of people pounding on the “let me in, add me to the sub” button, which sends modmail, may have overloaded the whole modmail system, which in turn sometimes goes kaflooey for no apparent reason (my theory is: it gets bored).

      • setsneedtofeed@beehaw.org
        link
        fedilink
        arrow-up
        23
        ·
        2 years ago

        I see this as a positive aspect of the protest.

        I am also amused that random people are pounding on the door for access, as if they think approved submitters are having a private tea party inside.

        • mizmoose@beehaw.org
          link
          fedilink
          arrow-up
          12
          ·
          2 years ago

          Clearly you’re not someone who would have to go back and clear out 259238 modmail messages and make sure that none of them are legit “I have a problem” notes.

          None of the subreddits I mod are that huge but just the thought of more than 100 at once makes me wanna cry.

          • Gork@beehaw.org
            link
            fedilink
            arrow-up
            9
            ·
            2 years ago

            At this point, they should just leave the 259,238 modmail messages for the admins to deal with. Let them sort through all that since this is all their doing.

          • setsneedtofeed@beehaw.org
            link
            fedilink
            arrow-up
            3
            ·
            2 years ago

            Oh clearly I’m not. I just don’t understand the thinking of people demanding access. It’s like the kind of person who pounds on the door of a closed restaurant because they can see the employees inside.

            • surrendertogravity@beehaw.org
              link
              fedilink
              arrow-up
              4
              ·
              2 years ago

              Oh man, my partner made a somewhat popular weapon calculator spreadsheet for Elden Ring, and the number of random Google Sheets edit requests they received was… quite a lot. (the instructions were right there for people to make a copy of the sheet to edit themselves! that’s how all of these sheets calculators work!) 🤦

            • mizmoose@beehaw.org
              link
              fedilink
              arrow-up
              4
              ·
              2 years ago

              People are selfish. People subconsciously think the rules apply to other people.

              People who demand to come into closed stores and restaurants are not the exception. What’s even crazier is when you turn one away, anyone who has seen the door open even though the person was told no and didn’t get inside suddenly decides that maybe if THEY pound on the door, they’ll magically get access!

      • chaorace@lemmy.sdf.org
        link
        fedilink
        arrow-up
        12
        ·
        2 years ago

        Ah, but you see they “improved” modmail recently. It would certainly never go “kaflooey” anymore. It now fails all like “kerpow!” instead… much cooler, you see.

        • AggressivelyPassive@feddit.de
          link
          fedilink
          arrow-up
          3
          ·
          2 years ago

          Well, of course, that’s just good engineering.

          You see, kerpow!s scale much better than kaflooeys due to cache invalidation problems in the ooey inductors, that’s like first semester knowledge.

      • darius@lemmy.world
        link
        fedilink
        arrow-up
        5
        ·
        2 years ago

        I’m just speculating of course, too, but could be some kind of sharding e.g. in the DB level. I can imagine the little subreddits draw little traffic hence fewer shards are allocated to them (like how S3 works).

    • naeap@sopuli.xyz
      link
      fedilink
      arrow-up
      6
      ·
      2 years ago

      I’m not sure if it’s just a load balancing issue. if all of Reddit can only access specific subs, maybe they split their servers that way

      but I’m just guessing, because it doesn’t make much sense to go down, when there is less data to process…

      • randomposter@sh.itjust.works
        link
        fedilink
        arrow-up
        8
        ·
        2 years ago

        in a way it does, when you’re building massive scale systems. Say you are the mitigation team and want to protect yourself against a malicious hacker/employee that starts shutting down web servers or removes posting permissions from the DB for everyone. You’re going to monitor the frequency of posts and if it drop too fast, you know something’s bad happening. You’re going to take automated measures against it - maybe freeze access to the DB completely, maybe switch to a (much less tested) backup region/system, etc… so you can see how things can snowball from there to strange scenarios…

        • naeap@sopuli.xyz
          link
          fedilink
          arrow-up
          2
          ·
          2 years ago

          yeah, well, maybe…

          usually unexpected situations have unexpected errors. so yeah, you could be right

    • Neotecha (She/her)@beehaw.org
      link
      fedilink
      arrow-up
      6
      ·
      2 years ago

      This makes a lot of sense to me (as an Operations Engineer).

      I could imagine the architecture team has low watermark triggers to rescale the architecture, kill and restore hosts, or other changes based on expected user load. When that load just… isn’t there, the automated tooling just loops the same actions causing site instability.

      I’ve had similar issues before, so it seems like a feasible explanation