• sunaurus@lemm.ee
    link
    fedilink
    arrow-up
    28
    ·
    edit-2
    1 year ago

    While it’s true that the hosts of popular communities will get more traffic, it’s actually not as bad as it first seems.

    Every Lemmy instance with at least one subscriber in that popular community will act as a mirror. That means that users who are just reading posts and comments will not cause any additional load on the home-instance of the popular community, because they are consuming local copies of the posts and comments.

    This will actually help scaling a lot, and is in fact exactly how many centralized platforms scale (by creating a bunch of read-only copies of content).

    As long as we can distribute the Lemmy userbase between different instances (and avoid creating one or two centralized super-instances), we can take a lot of advantage of this mirroring and the scaling will be quite good!

  • Nymphioxetine@beehaw.org
    link
    fedilink
    arrow-up
    5
    ·
    1 year ago

    Did you post this from Mastodon? I wish I could tell where this came from.

    Basically if I understand this right, if you have an instance with a very popular community on it. It is likely that it will need some massive infrastructure scaling if it wants to handle the enormous amount of world wide traffic?

    • dan@upvote.au
      link
      fedilink
      arrow-up
      6
      ·
      1 year ago

      I wish I could tell where this came from.

      Isn’t that what this colourful icon in Lemmy is for? It appears to link to the original source of the post or comment:

    • veaviticus@lemmy.one
      link
      fedilink
      arrow-up
      6
      ·
      1 year ago

      Yes. If you run the server, then you are the source of truth of that community. All other servers that federate your community query your server to access the community and show it to their users.

      So if you run a server and a community explodes there, you might only have 500 users on your instance, but you might have 50k users reading that community and interacting with it from other Lemmy instances, thus your server needs to scale to 50k users worth.

      And ever more essential, your server is the source of truth of that community. So if your server is hacked or corrupted or deleted, that community is gone. Other instances don’t mirror it (except for temporary caching), so the Lemmy network essentially is a trust network of other people maintaining servers long term (and each inventing a monetary system to pay for it). I still think the network might be better than a centralized system like reddit, but it definitely has a lot of growing and policies that need to be sorted out very soon

      • TheAmorphous@lemmy.world
        link
        fedilink
        arrow-up
        4
        ·
        1 year ago

        So are these other servers just routing requests from their users to your server’s community? Or are they actually copying everything over every so often (caching) and serving up the requests themselves? How real time is it, I guess is what I’m asking?

        • veaviticus@lemmy.one
          link
          fedilink
          arrow-up
          3
          ·
          1 year ago

          Yeah, apparently I was wrong about this (still learning lemmy and fediverse stuff…). Text content of posts and comments are “synced” to your server and stored in your database there. Then future requests for that content are served from your instance. So its not as bad as I thought it was (the network load should be lower since you aren’t acting as entirely a proxy, more like a cache), but database bloat will be a huge problem (its already a big problem in other federated things like mastodon and matrix, where every server ends up saving everything they want into their own database).

          I’m not sure what happens when the original server goes down, does the federated servers discard that data? Or do we each maintain a forever copy until we want to get rid of it ourselves? There’s also some notes I’ve seen about how servers only incrementally cache federated content (only posts and comments that are viewed by someone are fetched, and new comments may not be fetched until someone wants to see it)… so not everybody has a “pure and full” copy of posts necessarily.

          But overall I wonder how all the various sysadmins hosting these lemmy instances will deal with the expotential growth they’re going to see, or if smaller instances will start defederating to save on hardware costs (no reason for my tiny instance that only talks about blue shiny rocks to federate with lemmy.world and store all that content)

        • aard@kyu.de
          link
          fedilink
          arrow-up
          2
          ·
          1 year ago

          Information about post changes on the originating instance come in pretty much in real time, and get saved in the local database.

          If the local instance is configured with pictrs support images are also cached locally.

      • Corfiot@pleroma.elementality.org
        link
        fedilink
        arrow-up
        1
        ·
        1 year ago

        Why do these communities need long-term persistence? You could use a separate archive based on plain web server mirrors for anything worth preserving. Maybe it’s good that communities disappear and coalesce elsewhere, maybe it’s evolution. Maybe being forced to pick and choose what to archive and what to let go is a good thing.

        AP is a very chatty protocol and to handle large world-scale groups requires additions like compressed digest distribution, mirrors and sharding. Threads are already fragmented by design so in the end it may be unworkable to follow large group threads.

  • Mitchacho74@lemmy.world
    link
    fedilink
    arrow-up
    5
    ·
    1 year ago

    it may be worth putting a bug request on the activitypub github, because I agree that could become a huge problem, but its also alot of work to implement because most instances will need to update to the newest activitypub standard once they approve a new version of the standard.

    • Lockely@pawb.social
      link
      fedilink
      arrow-up
      5
      ·
      1 year ago

      Definitely a problem that needs solved sooner rather than later, but I assume with a lot more eyes on ActivityPub now, people much smarter than I will have a solution.

    • sunaurus@lemm.ee
      link
      fedilink
      arrow-up
      8
      ·
      1 year ago

      The network can actually scale quite well thanks to the fact that other instances will act as mirrors of communities!

      • 好かん@feddit.jp
        link
        fedilink
        arrow-up
        1
        ·
        1 year ago

        But what happens when the instance hosting the community goes down? Are all external instances still able to participate in that community?

        • Edo78@feddit.it
          link
          fedilink
          arrow-up
          7
          ·
          1 year ago

          No. The “single source of truth” is the instance hosting the community. If it goes down the community itself goes down with the ship. The only way to prevent it is to have a IT infrastructure that can provide redundancy

            • Edo78@feddit.it
              link
              fedilink
              arrow-up
              1
              ·
              1 year ago

              having a redundant system is feasible (I’m just a dev, not an architect so don’t take my words for granted) but it have to be designed and putted together … and prices are gonna skyrocket

              • jmp242@sopuli.xyz
                link
                fedilink
                arrow-up
                2
                ·
                1 year ago

                Lemmy / the fediverse isn’t designed this way, but it could be. There are certainly systems that share diskspace and are multimaster and keep stuff as long as someone is interested in it(i.e. accessing the data). I really start to think added to the lemmy / fediverse servers should be something like what freenet used to do in terms of hosting content.

        • BlameThePeacock@lemmy.ca
          link
          fedilink
          arrow-up
          4
          ·
          1 year ago

          If it’s just a temporary outage, whatever the mirror has received prior to the outage will be shown to users on that other instance but only local interactions for that instance will update it, when it comes back up, things like votes and comments will be synchronized again across all of the instances.

          For permanent outages, the community will just need to be started again on a new instance.

          • TheAmorphous@lemmy.world
            link
            fedilink
            arrow-up
            2
            ·
            1 year ago

            But they could pick up where the now defunct community left off, right? Like, the cached copy from another server could be imported on a new server elsewhere?

            • BlameThePeacock@lemmy.ca
              link
              fedilink
              arrow-up
              2
              ·
              1 year ago

              That functionality doesn’t currently exist, but migration of communities is something that’s being actively talked about for development.

          • 好かん@feddit.jp
            link
            fedilink
            arrow-up
            0
            ·
            1 year ago

            Reading this and trying to visualize the big picture, I think this is where kbin’s magazine is going to win out in the end

    • zero_iq@lemm.ee
      link
      fedilink
      arrow-up
      11
      ·
      edit-2
      1 year ago

      Lots of traffic, lots of posts, lots of comments, … That’s going to need more storage, more bandwidth, more CPU power, higher running costs. The original instance hosting the community bears a higher load than the instances that duplicate it.

      Ideally, there would be a way to more evenly distribute this load across instances according to their resources, but from my (currently limited) knowledge, I don’t think Lemmy/ActivityPub is really geared for that kind of distributed computing, and currently I don’t believe that there’s a way to move subs between instances to offload them (although I believe some people may be working on that).

      Perhaps the Lemmy back-end could use a distributed architecture for serving requests and storage, such that anyone could run a backend server to donate resources without necessarily hosting an instance.

      For example, I currently have access to a fairly powerful spare server. I’m reluctant to host a Lemmy instance on it as I can’t guarantee its availability in the long term (so any communities/user accounts would be lost when it goes down), but while it’s available I’d happily donate CPU/storage/bandwidth to a Lemmy cloud, if such a thing existed.

      There are pros and cons to this approach, but it might be worth considering as Lemmy grows in popularity.

      • SQL_InjectMe@partizle.com
        link
        fedilink
        arrow-up
        0
        ·
        1 year ago

        I don’t think it’s a problem. If you weren’t using activity pub and just something like reddit then if you were reddit (the sysadmin) you’d also deal with having to scale if your community gets really popular

        Stuff that gets linked to also has the same problem

        https://www.jwz.org/blog/2022/11/mastodon-stampede/

        (Btw I don’t like jwz but he mentions it here)

        • zero_iq@lemm.ee
          link
          fedilink
          arrow-up
          1
          ·
          edit-2
          1 year ago

          Funny how you say it’s not a problem, then go on to describe the problem that needs to be dealt with. Dealing with scaling is a problem, and it’s a problem that costs money.

          Posts like this: https://lemm.ee/post/58472 suggest it is a problem. The rise in traffic seen by Lemmy in the last few days is absolutely tiny compared to a site like reddit, and already instances are struggling to cope. The recent growth in user registrations represents only about 0.007% of reddit’s active user base. (~60K new Lemmy users vs 861,000,000 active monthly reddit users). A site like reddit costs millions to run.

          There are 190+ Lemmy instances last time I checked, yet almost all the brunt of this load has been borne by a handful of servers, which see an inordinate amount of traffic while 100+ other servers sit around idle. Why should a handful of “lucky” servers have to pay all the hosting costs? What if a volunteer-run instance explodes to reddit-like levels of popularity? It will simply fold, unless the volunteer has serious money to throw at the problem.

          • SQL_InjectMe@partizle.com
            link
            fedilink
            arrow-up
            1
            ·
            1 year ago

            Lemmy in the last few days is absolutely tiny compared to a site like reddit, and already instances are struggling to cope.

            While this is true, 5 days ago lemmy.ml, the biggest instance, was on a 67 EUR server which is very small. https://news.ycombinator.com/item?id=36270094

            Posts like this: https://lemm.ee/post/58472 suggest it is a problem

            This is a scaling problem (having more users means you need more mods) but I disagree with how they handled it and it isn’t a money related thing. My thoughts on this are in an older post when this was first announced https://partizle.com/comment/64178

            Why should a handful of “lucky” servers have to pay all the hosting costs?

            My initial idea is to use the something awful model of paying a one time fee to register an acount. The problem is that people would just sign up on another instance that doesn’t charge a fee but still add load to the lucky instance. Another approach could be to participate in communities on one of those lucky servers then you need to pay a one time fee to that server (comments would need to be removed by a bot if they’re not made by an approved user). I’m not saying that’s perfect, but it’s an idea. Adsense is another idea.

            • zero_iq@lemm.ee
              link
              fedilink
              arrow-up
              1
              ·
              1 year ago

              Again, you say it’s not a money problem… then go on to describe a money problem! 🤦‍♂️

              Also, did you read the link included in the post I linked to? ( https://beehaw.org/post/520044?scrollToComments=true )

              That’s a money problem and a time problem. (And time problems are money problems.)

              High traffic sites need lots of money and resources to run. That’s just a fact.

              We can solve this in many ways as Lemmy grows (and I think we will), but to just pretend there aren’t any problems to be solved is naive, IMO.

              If Lemmy grows to any significant percentage of reddit traffic, the Lemmy of tomorrow will (necessarily) look quite different to the Lemmy of today.

    • Mitchacho74@lemmy.world
      link
      fedilink
      arrow-up
      4
      ·
      1 year ago

      I think they meant like “overloaded”, like a hose spraying water, but the water being users from all around the fediverse

  • codus@leby.dev
    link
    fedilink
    arrow-up
    1
    ·
    1 year ago

    I’d love to see feedback from admins on the scaling problems they are having. Hopefully that scales per server and not per user per server.