Merkle trees and mountain ranges



Summary:

In a discussion about merkle trees and optimizations, Bram Cohen considered committing to the prefix as a way of handling cases where multiple keys share parts of the same prefix. However, he concluded that it would only save about 10% of memory in his implementation, so he put it in the 'maybe later' bucket. Peter Todd explained that committing to the prefix is a simple but possibly not-optimal way of committing to what data should be accessible under a given node. An alternative could have been ensuring that in terms of cryptographic tree position. Todd suggested that merkbiner trees always have maximum depths in proportion to log2(n) of the actual number of items in the tree against attackers who are choosing keys by grinding hashes. Cohen agreed that an attacker can force the tree to be deeper in places but said that it's mitigated in several ways such as using memory which won't cause a new block to be allocated so that only log(attack strength) - log(n) nodes are used. Todd also suggested that making things smaller than a single block (64 bytes) won't speed up hashing time, and making things a single byte longer than that doubles it. Cohen mentioned BLAKE2, which omits padding when the data to be hashed happens to be exactly one block in size and is significantly faster than SHA256. Cohen described how his implementation of merkle trees works, saying that each node contains a metadata byte followed by fixed-size secure hashes and (in some cases) pointers to where the children are. He said that he is taking pains to make all the hashing be of fixed-size things so that a non-padding variant of a secure hashing algorithm can be used. Cohen went on to explain that at the root of his implementation of merkle trees is a branch block consisting of all nodes up to some fixed depth with that depth set so that it roughly fits within a single memory page. Branch blocks are arranged with the nodes in fixed position defined by the prefix they correspond to, and the terminals have outpointers to other blocks. Below the root block are other branch blocks, each of which has a fixed 12-bit prefix it is responsible for. Cohen said that below the second level of the root block at Bitcoin UTXO set scale, there are leaf blocks consisting of nodes with outpointers to its own children which must be within the same leaf block. When a leaf block overflows the entry point into it which overflowed is moved into the active leaf for that branch, and if that's full a new one is allocated. Todd expressed confusion about how Cohen thought he could get practical Bitcoin updates down to a little bit less than one l2 cache miss per update. Cohen clarified that all of the cleverness is in where in memory these nodes are stored so that tracing down the tree causes very few cache misses. He said that at Bitcoin scale, there will probably only be 3 cache misses for a lookup, and that's a worst-case scenario. Todd argued that hashing is slow, and even if you could eliminate L2/L3 cache-misses, it wouldn't be much of an improvement due to overheads from short message lengths.


Updated on: 2023-06-11T05:43:52.510325+00:00