Author: Elias Rohrer 2023-08-07 10:31:11+00:00
Published on: 2023-08-07T10:31:11+00:00
The email raises concerns about collecting data from real-world forwarding nodes rather than creating synthetic/research data sets. The sender suggests that even if certain fields are obfuscated, it is still possible to re-identify node ids and channels by correlating the dataset with publicly available data. They propose using a long collection period to approximate the number of channels each observation point has with its neighbors, which could help determine the corresponding obfuscated node ids. Timestamps can be used to exclude nodes/channels that couldn't have been used at the time an HTLC was sent, and datasets from neighboring nodes can aid in identifying which anonymized clusters correspond to real-world clusters. By deriving HTLC amount from gathered fees and analyzing HTLC resolution time delta, conclusions can be drawn about liquidities and network-distance of the HTLC destination. The sender acknowledges that these estimations have some error probability and suggests fuzzing timestamps to make it harder for adversaries. They express familiarity with conducting Lightning research without real-world data sets and emphasize the importance of sharing aggregated results and clearly communicating the framework and associated risks to node operators considering sharing their data.
Updated on: 2023-08-11T15:57:28.006727+00:00