Author: Eric Voskuil 2017-04-08 22:37:50
Published on: 2017-04-08T22:37:50+00:00
In a recent conversation, an individual questioned the optimization choices of bitcrust, particularly its reliance on Core's UTXO store to implement script validation. They question whether it is truly "a storage engine without a UTXO-index" as it has a dependency on Core's UTXO index. The conversation continues with the individual stating that their assumptions about the storage (and performance) superiority of the design, or at least its implementation, seem unfounded.The Bitcoin implementation that uses a different approach to indexing for verifying the order of transactions has been discussed in an email thread. The approach uses a spend-tree where spends are scanned against spent outputs instead of unspent outputs. This approach was used by genjix in libbitcoin version2, but it had downsides such as higher storage space requirement due to storing the indexing data required for correlating the spends and higher validation complexity and cost in terms of computing the spent-ness (including spender height) of an output.To resolve these problems, the version3 store does not use a spends table/index or any table of UTXOs. Instead, it uses the tx hash table, augmented with 32 bits per output for spender height. Validation is highly parallelized and it works much faster than the spends table without the storage cost or complexity disadvantages. It also scales with available hardware, as the memory mapped files become in-memory hash tables. For low memory machines, it was important to implement an opaque UTXO cache to limit paging, but for higher-end systems, zero cache is optimal.The email thread also discusses the need for a clear separation of protocol and implementations, and how protocol improvements might not be worth considering in this specific case. Finally, the thread clarifies some details about the spends index and how it grows with the size of the spend set, making it exceed the size of the UTXO set.The individual suggests that their approach has real advantages over interleaving queries and validation or splitting input vs. output validation queries into two steps. They also suggest that the implementation of the ordering validation should not be contemplated in isolation from the output lookup requirement. If one is storing more index data (5.6gb) than 32 bits per output, they are using more space than production implementations. As for complexity, the individual argues that a loop to populate spend heights from a hash table and a loop to test their integer values is much simpler than other options.Regarding the detection of double spend, the individual states that it is not useful in isolation. One must also validate scripts, which requires outputs. They suggest that there is an opportunity to reject blocks faster by validating for double spends before validating script. But unconfirmed transactions do not exist in a branch, and are therefore not truly conflicting, until they are mined. And even after they are mined, conflicting txs remain potentially valid in other branches. So rejecting txs due to conflict comes down to a denial of service policy, which ultimately must be based on fee increment (e.g. RBF). But fees are based on the amount of the output value that remains unspent in the transaction. So this in turn requires the retrieval of outputs.The individual encourages checking out the results at https://bitcrust.org. However, they state that if by results, you are referring to performance numbers, it is hard to draw any conclusions without a full benchmark. They also argue that parallel block validation is difficult to justify and that the optimization choices of bitcrust may not have accounted for soft forks. In conclusion, the individual suggests that their approach has significant advantages over bitcrust's approach and that certain optimization choices may not be justified and that they may not have accounted for important factors such as soft forks.
Updated on: 2023-06-11T23:54:30.497542+00:00