Minimizing the redundancy in Golomb Coded Sets



Summary:

The author has conducted an analysis on selecting a good P value to reduce total data downloaded considering both filters themselves and blocks in the case of false positive matches, using data from mainnet. The quantity it minimizes is based on the number of filter elements per block (N), Golomb-Rice coding parameter (B), and the number of filter elements watched by the client (C). The main result shows that for different values of C, the optimal value of B ranges from 13 to 23. Any value of B in the range 16 to 20 seems reasonable, with M=1.4971 * 2^B for optimal compression. The selection of the parameter depends on the target number of elements that a client may watch. Gregory Maxwell via bitcoin-dev corrected a previous statement regarding configuration and stated that M=784931 and rice parameter 19 should be used. The author also provided some results and offered to share the CSV and raw notebook if people are interested.


Updated on: 2023-06-13T02:56:45.562289+00:00