[Discussion] Let's talk about lemmy.ml

Cracks_InTheWalls@sh.itjust.works · edit-2 7 months ago

[Discussion] Let's talk about lemmy.ml

Cracks_InTheWalls@sh.itjust.works · 7 months ago

You raise some interesting points, and I don’t think they should be dismissed out of hand. I have some questions though (some of them are re: your other comments here):

[…] some evidence that they are running their own modified version of the code which seems to give them special tools to do things like instant mass bans and selective federation of content.

Could you speak to this in a little more detail? Does what you are seeing inherently require functionality beyond what Lemmy’s public release offers natively, or is beyond the scope of something like an automod tool? Asked honestly, I am not an IT professional.

[…] if .ml were to be treated as a state espionage actor […] it would be trivial for them to collect identifying information via federation and to promote malicious or compromised websites by modifying their feeds, or even the feeds of individual users.

This is obviously a very serious accusation, but let’s put that aside for a moment.

My (limited) understanding is that as a function of using the ActivityPub protocol, it is already trivial to collect identifying information on users of federated services. What makes lemmy.ml unique in this regard - couldn’t a bad actor do this just as easily by other means? Simply it’s comparative size to other instances/services that can be leveraged for this purpose? Aren’t there lower profile means of accomplishing this same thing?

I don’t know enough about how federation works from a technical perspective to speak to feed manipulation when viewing a ‘rogue actor’ instance from a place like sh.itjust.works, but welcome comments/clarifying questions on this point from smarter people than myself. Want to know more, just don’t know what to ask.

Socsa@sh.itjust.works · edit-2 7 months ago

Federation exposes potentially quite a bit of user telemetry data through a few different vectors. For example, simply loading a thumbnail from another instance exposes a user’s IP to that host instance. The exact ability for a third instance to tie a specific web request or usage pattern to a specific user is unclear, but is not a large leap. I am working through some specific exploit ideas on a test server I run, but I don’t have a ton of time these days, and it’s difficult to model some of these vectors without real traffic. I can say that so far, if a user interacts with a post soon after making the content request, it’s pretty easy to grab their IP, especially on low traffic content. So if I can see that a user interacts with a niche community (because votes are federated for some strange reason), I can target them that way. I should also be able to set a cookie via the content request, as well as do all the typical browser fingerprinting tricks. Once that association happens, it becomes trivial to serve malicious content to an individual user. This is a very serious threat vector specifically because it’s easy to hide what you are doing from the rest of the world, so it requires vigilance by the target to uncover. If it is done rarely it would be all but impossible to spot.

The broader point is that there is clear motive and plausible opportunity here. From a cyber security perspective, that’s enough to take preventative and protective measures.