Can we shave 60% off of wallet sync times by compressing blocks from remote nodes?

tusker@monero.town · edit-2 7 months ago

Can we shave 60% off of wallet sync times by compressing blocks from remote nodes?

mister_monster@monero.town · 8 months ago

So you’ve got 2 components to sync time, bandwidth and processing. In Monero we already have to attempt to decrypt transactions in each block to see if it’s ours. This is what really takes time with regard to syncing.

If you compressed blocks, you’d save some bandwidth, but you’d take time client side to uncompress before sync. This adds to sync time. A user with high processing power using a node with low bandwidth might see a benefit, but for most people the bottleneck isn’t bandwidth, it’s processing power. Most people wouldn’t see a sync time improvement with your proposed scheme.

tusker@monero.town · 8 months ago

Decompression is a very fast operation, there are many locations where bandwidth is 1mbit/s and maxes out at 10mbit/s, not to mention bandwidth is also metered. With blocks now 3x the size from what they were a month ago it would be a significant improvement in terms of speed and cost. Blocks will only get bigger going forward.

mister_monster@monero.town · edit-2 8 months ago

The amount of operations per second required to decompress depends on the compression protocol, how compressed something is, so it can be fast or slow, also more importantly, the relationship between compute required to decompress and the amount decompressed is not linear, that is, 10% more compression does not translate to 10% more computation to decompress, it takes more than that. So at some point you’re taking more time to decompress than you saved downloading due to your bandwidth constraint. This is different for every node (or more accurately, every pair of nodes, sinceax bandwidth is the lowest of the two communicating) and so the more compression you use, the more you favor low bandwidth, high power nodes. I don’t know what the median or mean processing power is for nodes, and I don’t know what the median or mean bandwidth is, I’m sure some compression would benefit the network overall, but you’re always benefitting some nodes at the expense of others in doing it, and there’s no optimal scheme for all nodes on the network. Also this optimum is ever changing as people upgrade hardware and connections.

It might make sense to allow nodes to request compressed blocks from each other in the RPC, like a field in the request that says “send compressed blocks” so that high power, low bandwidth nodes can ask for it, but compression also has a processing requirement and the node being asked might not want to do it. It could cache compressed blocks, since blocks don’t change, but then it has to decompress compressed blocks every time it has to access them, or store a compressed and uncompressed version of each block if it needs constant access but wants to send compressed blocks. Its trade offs all the way down. There are considerations that can be made. But is it worth it? I don’t know. Also consider that adding a field to the request can be used for fingerprinting, the more granular you make RPC requests, the more data points can be used to fingerprint the node, which is a problem over Tor or i2p.

tusker@monero.town · edit-2 8 months ago

There are established compression standards which should avoid all of the issues you mention. Obviously we would not compress to the point where it takes longer to decompress than to download over a 1mbit/s connection or cause data loss.

Most software distributed over the internet is compressed despite all the “unknowns” being present. Data stream compression is likewise beneficial and established when transferring large amounts of data to remote locations, such as backups.

Let us not get caught up in analysis paralysis and instead stick to practical solutions that will benefit the majority of users.

SummerBreeze@monero.town · 8 months ago

How come the fees and which node (public or not) made such a difference if the issue was processing power? The bottleneck?

mister_monster@monero.town · edit-2 8 months ago

Well so the bandwidth of the remote node is a potential bottleneck, as well as the bandwidth of the person syncing. Whichever is smaller is going to be max rate at which data is sent, ignoring the connection path of course for simplicity. It can affect the speed of sync significantly. If you’ve got a powerful computer that can do a ton of operations per second and check a ton of blocks for transactions, your bottleneck is going to be bandwidth. But, if we decide to compress the blocks as you get them, you can alleviate that, with the cost of decompressing the blocks and so slowing your processing of them. Compression is an NP problem, so the more you compress the blocks, the longer it takes to decompress the data, and this relationship is not linear; 10% more compression requires more than 10% more processing time to decompress. Compressing too much eats up that bandwidth benefit you’re going to get and there’s a point of equilibrium that’s different for each node on the network, based on it’s bandwidth and processing power. Obviously, we cannot compress differently based on each node, so compressing necessarily is a trade off between bandwidth and hardware capability, any compression favors low bandwidth, higher power nodes, and no compression favors higher bandwidth, lower power nodes. Further, your compression scheme cannot compress beyond certain limits without becoming lossy, so there’s a practical limit even ignoring processing time. You also have to consider processing power of the remote node, since it has to compress blocks.