Delusions of a Protocol

flamingos-cant (hopepunk arc)@feddit.uk · 2 days ago

Delusions of a Protocol

chilicheeselies@lemmy.world · 2 days ago

I can see how the AT protocol is designed to scale. ActivityPub works fine now because the community is fairly small, but it will reach its limits as it is currently designed. Its basically an event driven model vs a push and pull model. Sure a docker image can more or less jusy be deployed, but that simplicity is a ticking time bomb.

Running a relay is way more powerful than the author states though. You could do stuff like selectivly intercept and reject events before they make it into or out of the firehose.

poVoq@slrpnk.net · 1 day ago

This depends on what you think the purpose of ActivityPub is and subsequently the type of scale. ActivityPub is designed for horizontal scale in a “social network”. If you have lots of participating entities with a more or less similar number of interconnected subscriptions ActivityPub scales extremely well, unlike ATProto, which needs to more or less ingest the entire network in its firehose.

But you are right that ATProto is better designed for “social media”, meaning that most subscriptions are one sided affairs with highly visible “influencers” being the main point around which the network operates. Obviously this is what most commercial networks are more interested in as it allows profitable advertisement and other forms of social influence.

I see these two types as entirely different forms of social interaction, and couldn’t care less about the latter. So I am not worried at all about scaling issues of ActivityPub, as it scales extremely well in the “social network” type of interaction.

General_Effort@lemmy.world · 8 hours ago

Yes. It’s only a problem if you expect or want the Fediverse to be the future of social media, which it isn’t.

naught101@lemmy.world · 1 day ago

I like this take, but I wonder if there’s eventually a combinatoric problem with having hundreds of thousands of small instances, each with thousands of connection to other instances? I have no idea how that relates to the network/computational constraints…

General_Effort@lemmy.world · 5 hours ago

That needs a longer explanation.

An instance does not interact with all other instances. It only syncs with other instances when users follow someone there, join a community, …

But that’s also a problem. It means you can’t search the entire Fediverse from a particular instance and find new and interesting discussions and people. There is no discovery feed. For that, you need something like Bluesky’s relay. That relay actually does keep up with what everyone is posting and archives it.

But that’s one aspect of Bluesky that draws a lot of criticism by Fedi people. A full relay is expensive to run and not something anyone can self-host. Pruned down versions are doable, though. If everyone actually did run their own relay, then one would get you the combinatorial problem.

In practice, large instances are the Fediverse solution to the discovery problem. You can see what the many users on that instance post. Also, the many users subscribe to many things and so a large instance will cache much content from elsewhere. That architecture encourages centralization.

There’s other difficult issues. So you have a little server that serves your content to a few followers. Some celebrity with millions of followers would have to rent an entire server rack. But what if little old you interacts with a celeb and now all their followers try to fetch your content from your little server? Common problem. You just need caching. EG the celebrity rack also serves your content to their followers and takes the load off your server. But now whoever is doing the caching can also filter replies. There’s no simply solution there.

poVoq@slrpnk.net · 1 day ago

Modern webservers don’t have a problem serving thousands of requests as long as they are spaced out a bit timewise. And since each AP instance only sees and interacts with a small part of the overall network it should not become an issue to expand the network horizontally. It is anyways probably better to think of interconected archipelagos and not of a singular network in the case of ActivityPub.

naught101@lemmy.world · 1 day ago

Is that really true though? Say we end up with 10k servers with 100-1000 users each, even if only 10% of those users have a connection to a server that no one rose on their server is connected to, that’s still a highly connected network.

Then add boosts from other servers (that incentivise cross-network follows)…

poVoq@slrpnk.net · 1 day ago

Mastodon already has those numbers you mention and there are no performance issues in the overall network.

naught101@lemmy.world · 14 hours ago

I don’t believe that’s true… It currently has around 9k servers, but I think the vast majority of those will have less than 10 users.

Anyway, there’s currently about 1m active users, so the real question is will it scale by 3 orders of magnitude? And my point being that I’d expect the network to become more connected as it scales (at least for the main archipelago, which is probably always going to house a majority of users).

poVoq@slrpnk.net · edit-2 8 hours ago

MAU is a very incomplete measure of active users as by far the most users lurk and post very little.

In total numbers Mastodon has about 10m users and only 30% of those are on mastodon.social, the rest is distributed on the 9k other instances. That’s pretty close to the scenario you stated.

INeedMana@piefed.zip · 1 day ago

ActivityPub works fine now because the community is fairly small, but it will reach its limits as it is currently designed. Its basically an event driven model vs a push and pull model. Sure a docker image can more or less jusy be deployed, but that simplicity is a ticking time bomb.

You mean that when there’s more traffic, the instances will start to DDOS each other?

wisdomchicken@piefed.social · 1 day ago

i ddos my wordpress-activitypub-enabled website every time i boost a post made from there to my 10k followers. tried every single caching plugin for it as well.

activitypub scaling is a very real issue

Rimu@piefed.social · 14 hours ago

Cloudflare caching can solve this. Cache based on the user agent.

poVoq@slrpnk.net · edit-2 1 day ago

ActivityPub is designed to scale well for millions of users with a low number of subscribers each (Dunbars number and so on). It is not designed as a mass media publishing tool where a few have tens of thousands or even millions of followers.

I consider this a feature, but feel free to disagree.

INeedMana@piefed.zip · 1 day ago

It gets DDOSed because after the boost, all the subscribers’ instances are calling it to retrieve the content?

Do you think a load balancer might help?

chilicheeselies@lemmy.world · 1 day ago

OP would have to have their wordpress site running on multiple instances to leverage that. It would work, but now they are paying for more infrastructure.

INeedMana@piefed.zip · 1 day ago

Yes, but those could be just “read instances” spinned up by a peak in requests at load balancer. Not running all the time

chilicheeselies@lemmy.world · 1 day ago

Fair point

chilicheeselies@lemmy.world · 1 day ago

Yeah i guess you could look at it that way. Each instance would have to scale horizontally to handle the load, which is waste.

I could be wrong, but my understanding is that ActivityPub is just a rest api contract that one can implement in order to communicate with the rest of the “network”. Its simple, but its such massive overhead to do this all via http. Pushing all your instances events to a dedicated stream and letting the other instnaces read it can be more performant and handle the load better. The downside though is who controls the streams?

IRC is the OG of federation, i am sure we could learn something from it and have federated networks that are in turn federated with eachother. I dunno, just thinking out loud here.

INeedMana@piefed.zip · 1 day ago

But is that a limitation of AP?

As far as I understand one could split a fediverse instance into three parts: data, backend and UI. The data is not shared 1to1 - each instance gets a copy of the activity and from that creates it’s own copy. Hence the same post on different instances will have different id The problem we are speaking about is the capability of the backend to process incoming copies. Meaning, I also understand that the part that serves the local data to UI should not be the problem

What if there was a queue at the front and from the backend a scalable ingestion worker would be split off? Those would only do the putting the actions onto the data. Probably with per community(?) FIFO topics/partitions, so we can process data in parallel and not worry about an updoot for a post that does not exist yet

Those would still be fairly easy to deploy and be vertically scalable, right? Or is there some bottleneck in the protocol itself?

chilicheeselies@lemmy.world · 1 day ago

So lets say there are 100 instances. My instance needs to issue api requests to each instance to sync with the network. They in turn need to issues 100 requests to me to sync (and eachother). What about when there are 100k instances? Its exponential.

From the looks of AT, its farily linear because its really just operating on a set of giant event streams (like kafka).

To me, ActivityPub being based on REST APIs was always a problem. On the upside it makes it approachable, but its not really the right tech imo. Use something without the overhead of http headers and whatnot.

poVoq@slrpnk.net · 1 day ago

So lets say there are 100 instances. My instance needs to issue api requests to each instance to sync with the network. They in turn need to issues 100 requests to me to sync (and eachother). What about when there are 100k instances? Its exponential.

This falsely assumes that everything gets federated to everyone, which isn’t the case for ActivityPub. You only get what you actually subscribe to with it.

INeedMana@piefed.zip · 1 day ago

From the looks of AT, its farily linear because its really just operating on a set of giant event streams (like kafka).

Wouldn’t that mean that this stream will have to scale horizontally?

chilicheeselies@lemmy.world · 1 day ago

Yes eventually, just like the instances do once enough users are hitting it. Its a matter of how much all servers in the network need to scale, but also the nature of the protocol itself. Streaming binary data is more performant than individual http api requests for instance. Event streams are the way to go un a decentrliazed network for sure.

INeedMana@piefed.zip · 1 day ago

With AP does the exchange have to be http requests? If every instance had a stream instead, would that break the protocol?

With decentralized AT, who would be maintaining the stream? On the image I don’t see a connection between the alternative firehose and right-side pds’

chilicheeselies@lemmy.world · 1 day ago

I suppose technically someone could implement streaming using AP payloads. So long as the format of the payloads are the same they could translate. It would be a different thing though without the pull part of it