[BIP Draft] Datastream compression of Blocks and Transactions



Summary:

The Lempel-Ziv family of compressors, commonly used in text-based data compression, assigns a unique number to every new subsequence it encounters and replaces it with that short integer when it re-encounters the same sequence. While this works well for English text, it is less efficient for binary data such as the Bitcoin wire protocol, which has opportunities for much better compression due to structured reuse of certain byte sequences. The current LZ variants require a 200-byte sequence to be repeated 199 times before developing a single reusable subsequence in the compression table, but a custom, Bitcoin-specific compressor could potentially achieve much higher compression ratios. However, there are also cons to creating a Bitcoin-specific compressor. It could make it difficult to change the wire format later on and brand new code may lack the required maturity. Additionally, compression algorithms should be thoroughly tested before inclusion. On the other hand, compressing Bitcoin at zero cost can provide additional throughput and leaving potentially free improvements on the table would be criminal. A programming challenge/contest to find the best possible, Bitcoin-specific compressor could be an effective way to discover the limits of compressibility for Bitcoin bits on a wire. This exercise would bring new programmers into the ecosystem and even if the final compression engine is not enabled by default or merged, the results would still be interesting.


Updated on: 2023-06-11T01:31:54.489647+00:00