I wish it was allowed to have persian letter usernames maybe even symbols as usernames it looks really cool and increases the username pool as well.

  • SorteKaninA
    link
    fedilink
    English
    arrow-up
    5
    arrow-down
    2
    ·
    edit-2
    10 days ago

    ActivityPub users need to be identified by some identifier in the URL, and Lemmy chose the user name to be that identifier. As a result, non-Latin usernames become… complicated.

    Sorry but this is just false. URIs can easily encode UTF-8 characters and it’s perfectly standard to do so via percent-encoding. Example: https://en.wikipedia.org/wiki/😂. Your browser will even automatically convert that 😂 into the appropriate percent-encoding and will even display the emoji in the address bar, even if that is not the “true” URI.

    This is, if you ask me, an unnecessary limitation in Lemmy.

    • Skull giver@popplesburger.hilciferous.nl
      link
      fedilink
      English
      arrow-up
      6
      ·
      9 days ago

      Link detection is flaky as hell, especially for special characters. They rarely work reliably. URLs themselves don’t contain unicode. They use basic ASCII and anything beyond that needs to be encoded in some form. The link you posted isn’t a spec-compliant link, it only works because Lemmy apps and browsers are nice and do the conversion to the real URL for you. According to the spec:

      When a new URI scheme defines a component that represents textual data consisting of characters from the Universal Character Set [UCS], the data should first be encoded as octets according to the UTF-8 character encoding [STD63]; then only those octets that do not correspond to characters in the unreserved set should be percent- encoded. For example, the character A would be represented as “A”, the character LATIN CAPITAL LETTER A WITH GRAVE would be represented as “%C3%80”, and the character KATAKANA LETTER A would be represented as “%E3%82%A2”.

      If you use usernames as identifiers (which, again, are optional) like Lemmy does, databases and external entities will use the percentage URLs, not the readable ones. Unicode domains will have their xn-- form stored as well. It’s up to apps and browsers to decide those and turn them back into unicode. It’s not really relevant what apps and browsers show you when it comes to the technical interoperability of users.

      ActivityPub itself has wide support for various languages, including having different names and content for different languages. The username (actually preferredUsername) is transmitted through JSON, which is by definition UTF-8, so most encodings in use today (not that weird Japanese one and that other Asian encoding that’s not UTF compatible) will Just Work™ assuming the necessary URL encoding and decoding logic is added in the right places.

      I think Lemmy can be patched to accept unicode characters as usernames, as the current limitations in code and in the UI are just choices made during development. I don’t think it’ll add much, though.

    • Asudox@lemmy.world
      link
      fedilink
      English
      arrow-up
      3
      arrow-down
      1
      ·
      10 days ago

      Using ASCII in URLs is simple and is less error prone than “supporting” unicode via percent encoding. It is also just a convention to use ASCII for usernames in many platforms. ASCII is also supported out of the box in major OSes while some unicode characters might not. What about impersonation? And what about people trying to type in the username of someone that uses unicode? It is not logical to use unicode in this case.

      • SorteKaninA
        link
        fedilink
        English
        arrow-up
        5
        ·
        10 days ago

        It is also just a convention to use ASCII for usernames in many platforms.

        That’s only true for platforms that only caters to the English speaking world. The fediverse should be and is much broader than that.

        ASCII is also supported out of the box in major OSes while some unicode characters might not.

        What? There is no major OS that does not support Unicode out of the box.

        Percent encoding is perfectly fine and users won’t even see it.

        Also please stop down voting twice with your alt accounts, that’s not cool.

        • sznowicki@lemmy.world
          link
          fedilink
          English
          arrow-up
          1
          ·
          10 days ago

          Punycode would work here better I think as it’s plain ASCI with no special characters except a dash if I recall correctly.

          • SorteKaninA
            link
            fedilink
            English
            arrow-up
            1
            ·
            10 days ago

            Punycode is not solving the same problem. Punycode solves Unicode in domain names. Percent encoding is for Unicode in URL paths. Lemmy only needs to worry about the paths, Punycode should be “supported” out of the box without any special handling