Jamming Mitigation Dry Run [combined summary]



Individual post summaries: Click here to read the original discussion on the lightning-dev mailing list

Published on: 2023-08-10T19:13:38+00:00


Summary:

Vincent clarifies that he did not intend to suggest that Elias' points were incorrect. He apologizes if he gave the wrong impression. Vincent expresses his desire to understand Carla and Clara's proposal regarding a dry-run. The purpose of the dry-run is to test if the theory holds up in practice by selecting a few nodes and sharing the data. Vincent suggests that a small number of nodes would suffice due to the localized nature of reputation. Each node builds its own reputation. Vincent understands that the dry-run involves selected nodes. Once the direction is clear, they can prioritize privacy and expand data collection over time. Vincent agrees with Elias that privacy protection for nodes and channels should be considered. He acknowledges Elias' point about continuous data collection for tracking the network's evolution. Vincent mentions that reverse engineering is possible to reconstruct missing parts of the graph. However, his main point is that running data collection for several months might make the privacy concerns seem overly complex. Vincent assumes that trust in the source of the data is necessary. He agrees that conveying goals, underlying purposes, and addressing potential drawbacks is important. Vincent appreciates Elias' insightful reflection points.The email raises concerns about collecting data from real-world forwarding nodes rather than creating synthetic/research data sets. The sender suggests that even if certain fields are obfuscated, it is still possible to re-identify node ids and channels by correlating the dataset with publicly available data. They propose using a long collection period to approximate the number of channels each observation point has with its neighbors, which could help determine the corresponding obfuscated node ids. Timestamps can be used to exclude nodes/channels that couldn't have been used at the time an HTLC was sent, and datasets from neighboring nodes can aid in identifying which anonymized clusters correspond to real-world clusters. By deriving HTLC amount from gathered fees and analyzing HTLC resolution time delta, conclusions can be drawn about liquidities and network-distance of the HTLC destination. The sender acknowledges that these estimations have some error probability and suggests fuzzing timestamps to make it harder for adversaries. They express familiarity with conducting Lightning research without real-world data sets and emphasize the importance of sharing aggregated results and clearly communicating the framework and associated risks to node operators considering sharing their data.Carla and Clara clarified that only aggregated results would be the default plan. Elias appreciates the use of timestamp fuzzing to obfuscate payment timing, but notes that the resolution time deltas are the relevant data for timing-based inference.Vincent expresses his lack of concern regarding privacy issues for the selected node running the dry run. He states that he is not buying anything significant with his research node, so there is no real privacy threat. The only potential privacy leak he identifies is certain fields, but they are irrelevant from an analysis perspective and can be faked using core lightning. He emphasizes the importance of ensuring that the fake channel ID remains constant to achieve 100% reproducibility and allow more people to verify and identify any mistakes. Vincent worries about being confined to a limited data set from real nodes and real bitcoin transactions, which may result in a lack of certainty. He questions if there is something he is missing regarding faking the channel ID and node pub key. Vincent also mentions that he has witnessed PhD programs failing to start due to a lack of real data examples.The email discusses the importance of privacy and data handling in a research project. The ideal collection period for the data is limited to 6 months. The aim is to provide node operators with local tooling so that they don't need to export the data. If people are comfortable sharing their data, it will be handled following best practices and not shared further. The fields will be anonymized as mentioned in the original email. In response to concerns about timestamps, they can be fuzzed as only the resolution period matters. The sender believes that conducting research based on real-world data is challenging but worthwhile.The lack of real-world datasets for conducting simulations and experiments on Lightning can be limiting. However, collecting the proposed fields over a long period may potentially lead to re-identification of anonymized channel counterparties based on heuristics correlated with public graph data. Combining datasets from multiple collection points could further allow drawing conclusions on transferred amounts, channel liquidities, and even the payment destination's identity. Trust in the researchers is crucial as surrendering this data requires it. It is recommended to clarify upfront the time-boxed collection period, data storage, and access permissions. Defining the collection period is necessary to avoid incentivizing node operators to store HTLC data long-term.We are proceeding with the plan discussed at the summit to conduct a "dry run" of HTLC endorsement and local reputation tracking. The objectives of this plan are to validate the behavior of local reputation algorithms using real-world data, gather liquidity and slot utilization data for resource bucketing, and establish a common data export format for analysis. The plan consists of several phases.


Updated on: 2023-08-12T01:49:13.849540+00:00