Improve Lightning payment reliability through better error attribution [combined summary]

Individual post summaries:

Improve Lightning payment reliability through better error attribution Joost Jager 2019-06-15 06:52:30+00:00
Improve Lightning payment reliability through better error attribution ZmnSCPxj 2019-06-15 02:53:16+00:00
Improve Lightning payment reliability through better error attribution Joost Jager 2019-06-14 13:50:16+00:00
Improve Lightning payment reliability through better error attribution Joost Jager 2019-06-14 10:59:26+00:00
Improve Lightning payment reliability through better error attribution ZmnSCPxj 2019-06-14 09:38:31+00:00
Improve Lightning payment reliability through better error attribution Joost Jager 2019-06-14 09:10:26+00:00
Improve Lightning payment reliability through better error attribution ZmnSCPxj 2019-06-14 08:24:30+00:00
Improve Lightning payment reliability through better error attribution Bastien TEINTURIER 2019-06-13 11:14:39+00:00
Improve Lightning payment reliability through better error attribution Joost Jager 2019-06-12 12:59:40+00:00

Click here to read the original discussion on the lightning-dev mailing list

Published on: 2019-06-15T06:52:30+00:00

Summary:

When a Lightning Network node encounters an error, it must wait for `revoke_and_ack` before reporting the error upstream to ensure safe transmission. However, `update_fulfill_htlc` does not require waiting for `revoke_and_ack` as it is immediately reported upstream. Background probing can provide more information before problems occur for "real" payments, but may not scale well and could be limited for private channels. Fat errors are still useful to maximize the value of probes and minimize the number required.The accuracy of payment times in the Lightning Network is affected by the potential for nodes to drop previous channel states. A node can only propagate an `update_fail_htlc` if the downstream `update_fail_htlc` has been committed by `revoke_and_ack`. To gather more information about these issues before they cause problems for real payments, background probing is suggested. Adding padding along the way with the erring node's padding being bigger than intermediate nodes' padding can mitigate the nodes' ability to determine their position. However, adding padding in a way that hops don't learn anything about their position remains uncertain.Penalizing channels when a payment is delayed is another topic of discussion. One approach is to tolerate slight variations or abandon the synchronization requirement altogether. Another approach involves using active probing and recording total imposed delay and number of attempts for each node. The cost of traversing a node for a "real" payment would then be adjusted based on the ratio of total imposed delay to number of attempts. However, this approach may lead to less accurate results and more non-ideal payment attempts.The synchronization requirement between nodes in the Lightning Network is a concern. Active probing and tracking persistent data per node, such as total imposed delay and number of attempts, is suggested as an alternative solution. This would allow for more accurate cost adjustments when finding routes for real payments and penalizing specific nodes for failures. Penalizing channels between nodes if a delay is detected on a payment route is also mentioned. The suggested changes would not require any alterations to the current spec.The conversation explores the proposal of adding timestamps for certain events to ensure accuracy in recording node events. However, concerns are raised about strong synchronization of all nodes' clocks to a global clock and the possibility of error if any node's clock is off. The held duration is suggested as enough to identify a pair of nodes responsible for the delay. It is proposed that either B or C is delaying the payment and the channel between them would be penalized.Reliability of payments in the Lightning Network depends on the reliability of chosen routes. Previous payment attempts can help select better routes, but non-ideal payment attempts make it difficult to determine which node should be penalized. One potential solution is to change the failure message format so that every hop adds an HMAC and timestamps to narrow down the source of corruption to a pair of nodes. Designing the failure message format to prevent hops from learning their position is a challenge, and alternative solutions involving padding and variable-length messages are discussed. Probing is also suggested as a way to locate bad nodes, but it requires more attempts and complex processing of outcomes.In order to improve payment experience, implementations track past performance of nodes and channels in the Lightning Network. Non-ideal payment attempts, such as failed payments or those that take a long time to complete, make it difficult to determine the node that should be penalized. One solution is to change the failure message format so that every hop adds an HMAC and timestamps, allowing the source of corruption to be narrowed down to a pair of nodes. The challenge lies in designing the format to prevent hops from learning their position. Alternative solutions involving padding, variable-length messages, and probing are also discussed. The importance of being able to locate bad nodes irrefutably is emphasized.

Updated on: 2023-07-31T21:40:57.126845+00:00