Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dandelion++ Rewrite #2628

Merged
merged 23 commits into from
Mar 20, 2019
Merged

Conversation

antiochp
Copy link
Member

@antiochp antiochp commented Feb 25, 2019

Resolves #2176.

Replaces #2344. Minimizing unrelated changes and keeping focused on Dandelion++.

This is a minimal implementation of Dandelion++ to replace our existing minimal implementation of the original Dandelion.

  • Introduce DandelionEpoch to track current epoch stem/fluff and current outbound relay peer
  • Simplify existing Dandelion monitor
  • Get rid of PoolEntryState
  • Each epoch lasts for (by default) 10 mins.
    • Configurable via epoch_secs
  • If node is in "stem" epoch then we send stem txs on to outbound relay peer
  • If nodes is in "fluff" epoch then we hold stem txs (allowing opportunity to aggregate)
  • Dandelion monitor runs every 10s, if current epoch is "fluff" -
    • If epoch expired, aggregate and broadcast all stem txs
    • If any stem txs older than 30s (aggregation_secs), aggregate and fluff all stem txs
    • Otherwise wait, allowing stem txs opportunity to aggregate
  • Handle expired (3 mins, embargo_secs) by broadcasting individual txs

@antiochp antiochp changed the base branch from master to milestone/1.1.0 March 11, 2019 13:02
@antiochp antiochp changed the title [WIP] reworked the dandelion rewrite (dandelion++) Dandelion++ Rewrite Mar 11, 2019
@antiochp antiochp marked this pull request as ready for review March 11, 2019 13:29
@antiochp antiochp added this to the 1.1.0 milestone Mar 11, 2019
@antiochp
Copy link
Member Author

Default Dandelion config -

#########################################
### DANDELION CONFIGURATION           ###
#########################################
[server.dandelion_config]

#dandelion epoch duration
epoch_secs = 600

#fluff and broadcast after embargo expires if tx not seen on network
embargo_secs = 180

#dandelion aggregation period in secs
aggregation_secs = 30

#dandelion stem probability (stem 90% of the time, fluff 10% of the time)
stem_probability = 90

@antiochp
Copy link
Member Author

antiochp commented Mar 12, 2019

This is from a node running on mainnet -

20190312 08:40:19.360 DEBUG grin_p2p::protocol - handle_payload: received stem tx: msg_len: 1638
20190312 08:40:19.396 DEBUG grin_pool::pool - add_to_pool [stempool]: d346119cd8b5 (p2p) [in/out/kern: 1/2/1] pool: 1 (at block 000000a50b26)
20190312 08:40:19.396 INFO grin_servers::common::adapters - Fluff epoch. Aggregating stem tx(s). Will fluff via Dandelion monitor.
20190312 08:40:24.161 DEBUG grin_pool::pool - add_to_pool [stempool]: a3498982130f (p2p) [in/out/kern: 6/2/1] pool: 0 (at block 000000a50b26)
20190312 08:40:24.177 DEBUG grin_pool::pool - add_to_pool [stempool]: d346119cd8b5 (p2p) [in/out/kern: 1/2/1] pool: 1 (at block 000000a50b26)
20190312 08:40:26.166 DEBUG grin_servers::grin::dandelion_monitor - dand_mon: Found 2 txs in local stempool to fluff
20190312 08:40:26.203 DEBUG grin_pool::pool - add_to_pool [txpool]: 6f639bfad0fd (fluff) [in/out/kern: 7/4/2] pool: 21 (at block 000000a50b26)

This was an "in the wild" tx aggregation via Dandelion++ which I don't think we have seen very often with the existing Dandelion impl.

2 txs arrived within 30s of each other on this node which was in a "fluff" epoch.
Dandelion monitor ran, aggregated them and "fluffed" the resulting tx, broadcasting out to the network.

Note: Node was running in "fluff" mode 100% of the time here as part of my testing (normally in stem mode 90% of the time). So this node is basically always acting as a "sink" for stem txs, collecting them and periodically fluffing them.

Given we are "fluff" all the time, we do still see a decent number of actual tx aggregation, and not just fluffing individual txs (note the kernel counts > 1) -

20190306 08:27:27.373 DEBUG grin_pool::pool - add_to_pool [txpool]: eef3499b5e4e (fluff) [in/out/kern: 1/2/1] pool: 5 (at block 000000f4f76d)
20190306 08:27:37.401 DEBUG grin_pool::pool - add_to_pool [txpool]: 83fa903cbdf3 (fluff) [in/out/kern: 1/2/1] pool: 7 (at block 000000f4f76d)
20190306 20:14:59.857 DEBUG grin_pool::pool - add_to_pool [txpool]: b4f7992bb92b (fluff) [in/out/kern: 1/2/1] pool: 0 (at block 000002aad999)
20190306 21:01:29.921 DEBUG grin_pool::pool - add_to_pool [txpool]: 00773be711d9 (fluff) [in/out/kern: 3/2/1] pool: 1 (at block 00000a78a23f)
20190306 21:01:31.109 DEBUG grin_pool::pool - add_to_pool [txpool]: 00773be711d9 (fluff) [in/out/kern: 3/2/1] pool: 0 (at block 000001943fc7)
20190307 01:13:20.775 DEBUG grin_pool::pool - add_to_pool [txpool]: ad4f73bb11c1 (fluff) [in/out/kern: 1/2/1] pool: 1 (at block 0000007c80ca)
20190307 01:14:03.332 DEBUG grin_pool::pool - add_to_pool [txpool]: ad4f73bb11c1 (fluff) [in/out/kern: 1/2/1] pool: 0 (at block 000001fccd0e)
20190307 07:32:21.303 DEBUG grin_pool::pool - add_to_pool [txpool]: 31451d013891 (fluff) [in/out/kern: 6/4/1] pool: 8 (at block 0000039d6e31)
20190308 02:17:42.954 DEBUG grin_pool::pool - add_to_pool [txpool]: c2f978ebddf7 (fluff) [in/out/kern: 3/2/1] pool: 0 (at block 00000536ea34)
20190308 02:18:42.977 DEBUG grin_pool::pool - add_to_pool [txpool]: 7ee2c518d0a8 (fluff) [in/out/kern: 2/2/1] pool: 4 (at block 00000536ea34)
20190311 11:41:41.644 DEBUG grin_pool::pool - add_to_pool [txpool]: 2b411ecc93ef (fluff) [in/out/kern: 3/3/1] pool: 0 (at block 000022ae3371)
20190311 11:41:48.316 DEBUG grin_pool::pool - add_to_pool [txpool]: 2b411ecc93ef (fluff) [in/out/kern: 3/3/1] pool: 0 (at block 000000854aec)
20190311 16:47:53.798 DEBUG grin_pool::pool - add_to_pool [txpool]: eb02eb69fa18 (fluff) [in/out/kern: 1/2/1] pool: 1 (at block 00000315f64d)
20190311 17:30:53.875 DEBUG grin_pool::pool - add_to_pool [txpool]: 775731bc0971 (fluff) [in/out/kern: 1/2/1] pool: 4 (at block 00001276f1c9)
20190311 17:34:03.900 DEBUG grin_pool::pool - add_to_pool [txpool]: b966e9f951e1 (fluff) [in/out/kern: 1/2/1] pool: 2 (at block 000002cfd2ed)
20190311 22:30:34.080 DEBUG grin_pool::pool - add_to_pool [txpool]: 8d4e95c090d6 (fluff) [in/out/kern: 1/2/1] pool: 2 (at block 0000031a4645)
20190312 00:28:34.162 DEBUG grin_pool::pool - add_to_pool [txpool]: eb6d1367f2bb (fluff) [in/out/kern: 2/2/1] pool: 0 (at block 000000eae474)
20190312 02:17:54.240 DEBUG grin_pool::pool - add_to_pool [txpool]: 87ac769638de (fluff) [in/out/kern: 1/2/1] pool: 2 (at block 00000049ed68)
20190312 02:18:22.474 DEBUG grin_pool::pool - add_to_pool [txpool]: 87ac769638de (fluff) [in/out/kern: 1/2/1] pool: 0 (at block 000002e8ada0)
20190312 08:25:15.713 DEBUG grin_pool::pool - add_to_pool [txpool]: 8db819326431 (fluff) [in/out/kern: 3/4/2] pool: 12 (at block 00000266245b)
20190312 08:25:18.501 DEBUG grin_pool::pool - add_to_pool [txpool]: 8db819326431 (fluff) [in/out/kern: 3/4/2] pool: 1 (at block 000004be37dd)
20190312 08:26:25.763 DEBUG grin_pool::pool - add_to_pool [txpool]: 9bd9a8d165c6 (fluff) [in/out/kern: 2/4/2] pool: 4 (at block 000002f686d7)
20190312 08:27:35.816 DEBUG grin_pool::pool - add_to_pool [txpool]: 8079c0285e0b (fluff) [in/out/kern: 2/4/2] pool: 13 (at block 000002f686d7)
20190312 08:28:35.896 DEBUG grin_pool::pool - add_to_pool [txpool]: 9f0587885702 (fluff) [in/out/kern: 1/2/1] pool: 19 (at block 000002f686d7)
20190312 08:29:10.862 DEBUG grin_pool::pool - add_to_pool [txpool]: 9f0587885702 (fluff) [in/out/kern: 1/2/1] pool: 0 (at block 00000256c097)
20190312 08:29:15.952 DEBUG grin_pool::pool - add_to_pool [txpool]: 1ff299cbcc50 (fluff) [in/out/kern: 2/4/2] pool: 4 (at block 00000256c097)
20190312 08:29:23.965 DEBUG grin_pool::pool - add_to_pool [txpool]: 1ff299cbcc50 (fluff) [in/out/kern: 2/4/2] pool: 0 (at block 0000047f2631)
20190312 08:38:06.059 DEBUG grin_pool::pool - add_to_pool [txpool]: e1394bce43f6 (fluff) [in/out/kern: 1/2/1] pool: 19 (at block 0000029b97e8)
20190312 08:40:26.203 DEBUG grin_pool::pool - add_to_pool [txpool]: 6f639bfad0fd (fluff) [in/out/kern: 7/4/2] pool: 21 (at block 000000a50b26)
20190312 08:41:36.306 DEBUG grin_pool::pool - add_to_pool [txpool]: f665378363e3 (fluff) [in/out/kern: 2/4/2] pool: 30 (at block 000000a50b26)

@antiochp
Copy link
Member Author

If anybody does want to test this out, feel free to set stem_probability = 0 in the config file to force the node to always aggregate and fluff (rather than stem and relay).

Copy link
Contributor

@ignopeverell ignopeverell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good, just a few minor comments.

servers/src/grin/dandelion_monitor.rs Outdated Show resolved Hide resolved
servers/src/grin/dandelion_monitor.rs Outdated Show resolved Hide resolved
servers/src/common/types.rs Outdated Show resolved Hide resolved
pool/src/transaction_pool.rs Outdated Show resolved Hide resolved
@lehnberg
Copy link
Collaborator

@antiochp is the node able to set its own fluff/stem status? Or was that just something for testing here, where a node was set to “100% fluff mode”? Could nodes decide on this individually in the wild?

If so, is that a problem? Could it be used to attack the network somehow? By deploying lots of "100% fluff, never broadcast" nodes that “drains” all the transactions? I know we use embargo timers, but would that then lead to the originating nodes sending out the transactions at expiry of embargo?

@antiochp
Copy link
Member Author

@lehnberg nodes can choose any value between 0% and 100% in terms of stem probability.

What you are suggesting is not limited to Dandelion in any way. I could run a modified node that simply did not broadcast any txs (or blocks for that matter). There's nothing we could do to prevent that.

@lehnberg
Copy link
Collaborator

Understood. In such an event and at the embargo timer expiration, would it lead to the originator broadcasting their transaction? Or could it plausibly also be other nodes doing that?

Given that the originator will have the shortest time left on their embargo timer, it would be them doing that, right?

@antiochp
Copy link
Member Author

There's some randomness in terms of the timer length - so even though the originator starts their timer first its still reasonably random who would actually broadcast.
With Dandelion++ we're not introducing a delay at each stem hop, so all the timers are running pretty close together. With the original Dandelion we were adding a delay at each hop and the originator was far more likely to be the one to broadcast in that scenario as their timer was more likely to expire first.

@antiochp
Copy link
Member Author

Note: 40 lines of code removed! 🚀

@antiochp
Copy link
Member Author

ok merging this into 1.1.0 🤞

@antiochp antiochp merged commit a2adf2d into mimblewimble:milestone/1.1.0 Mar 20, 2019
@antiochp antiochp deleted the dandelion_rewrite branch March 20, 2019 13:09
@0xmichalis
Copy link
Contributor

@quentinlesceller
Copy link
Member

Indeed @Kargakis this doc reflects the previous version of Dandelion.

@antiochp
Copy link
Member Author

antiochp commented Apr 1, 2019

Yes it does and should. I totally forgot about that. I'll open an issue to track this.

@antiochp
Copy link
Member Author

antiochp commented Apr 1, 2019

Note my Help Wanted tag on that issue 😄

@lehnberg
Copy link
Collaborator

lehnberg commented Apr 4, 2019

Thinking about this some more (belatedly), I'm becoming less convinced of the privacy benefits of aggregation before fluffing.

With the probability of a node being in stem mode set to 90% and fluff mode set to 10%, it implies that only 1/10th of the nodes on the network will be fluff nodes, i.e. "sinks" where transactions become aggregated.

If this is the case, deploying malicious surveillance nodes with fluff mode = 100% will have a relatively high weight in the pool of total fluff nodes. These can then start collecting data on stem transactions. Even if the nodes end up aggregating and broadcasting the transactions correctly (thus disguising any issue on the network), they will be able to deduce the original unaggregated transactions (as they will be receiving them in piecemeal one by one).

Wouldn't some aggregation by honest nodes be better than nothing, i.e. not even attempting to aggregate? Maybe. But I think there's a problem to have a system that users believe achieves a certain degree of privacy, if this can easily be circumvented.

In the dandelion++ paper, the node's choice between stemming or fluffing is pseudorandom [edit: but still recommends 10% or 20% fluff mode setting].

Perhaps the entire objective of aggregation should be removed from our thinking around Dandelion?
Not sure if any of this makes any sense. But I'm hoping someone could convince me why aggregation pre-fluffing makes a big difference from a privacy perspective (or from any perspective at all for that matter).

@antiochp
Copy link
Member Author

antiochp commented Apr 4, 2019

If this is the case, deploying malicious surveillance nodes with fluff mode = 100% will have a relatively high weight in the pool of total fluff nodes. These can then start collecting data on stem transactions. Even if the nodes end up aggregating and broadcasting the transactions correctly (thus disguising any issue on the network), they will be able to deduce the original unaggregated transactions (as they will be receiving them in piecemeal one by one).

If a malicious actor controlled 10% of the nodes on the network then they would have 50% of the "sinks" and would therefore see roughly 50% of all stem txs. Is that your thinking?

How likely is it that a malicious actor would be able to do this?

Kind of feels like if we are in this situation there is not a lot anybody can do in terms of preventing them observing txs and constructing a reasonably complete tx graph.

@antiochp
Copy link
Member Author

antiochp commented Apr 4, 2019

In the dandelion++ paper, the node's choice between stemming or fluffing is pseudorandom, implying a 50% coin flip between the two choices.

Is that true? I thought it was still 90/10.

@lehnberg
Copy link
Collaborator

lehnberg commented Apr 4, 2019

If a malicious actor controlled 10% of the nodes on the network then they would have 50% of the "sinks" and would therefor see roughly 50% of all stem txs. Is that your thinking?

Yes, how many nodes are there on the network today? ~100? This means that 10 nodes will be fluffing nodes at any given time. If a malicious actor deploys 10 always-fluffing nodes as well, it means they see 50% of all transactions.

How likely is it that a malicious actor would be able to do this?

It's 5x as likely compared to if the probability was a fair coin flip and we had 50 honest fluff nodes in my example above.

Is that true? I thought it was still 90/10.

[Edit: yes, digging deeper you're right, q is recommended to be 0.1 or 0.2]

I struggle to understand what the benefits from aggregation are. It's not a bad thing, but it's not clear to me why we should strive for it in our design objectives. If it happens, then great. [truncated]

@lehnberg
Copy link
Collaborator

lehnberg commented Apr 4, 2019

Please disregard the statement of 90/10, I dug deeper in the papers, and now see that q (path length parameter) is recommended to be 0.1 or 0.2 (i.e. 10% or 20% likelihood of fluffing). I'm confused but nevertheless give up. Don't really feel any wiser.

@lehnberg
Copy link
Collaborator

lehnberg commented Apr 4, 2019

@antiochp in the Grin implementation, does a peer always stem their own transaction? this is what's instructed in the paper, i.e. to only ever fluff stem transactions received from other peers, but never their own.

@antiochp
Copy link
Member Author

antiochp commented Apr 5, 2019

in the Grin implementation, does a peer always stem their own transaction?

Good question.
I actually thought we did yes. It was part of the original design and what I had intended to implement. But - I think we actually forgot about this (and I remember now it being kind of tricky because we lose the "is this our tx?" before we get to the point of deciding to stem/fluff.

The downside here is we may end up aggregating our own txs if our node is in a fluff epoch (and not a stem epoch).

I'm not actually sure what kind of downside we see with this implementation.

Some txs will end up appearing on the network from the nodes where they actually originated - but we won't know which txs these are, so I'm not entirely convinced we actually lose any privacy here?

@lehnberg
Copy link
Collaborator

lehnberg commented Apr 6, 2019

But - I think we actually forgot about this (and I remember now it being kind of tricky because we lose the "is this our tx?" before we get to the point of deciding to stem/fluff.

Why is it important to keep the "is this our tx?" knowledge? In my naive interpretation of how D++ was being implemented in Grin, I was presuming that nodes add any tx, whether received or originating from themselves to their own stempool and set an embargo timer on them. If you stem it, and don't see it within the embargo, you will fluff it either way.

I'm not actually sure what kind of downside we see with this implementation.

I think they added this requirement for a reason. Let's say we have a network of 100 honest nodes that are being observed. 10 of those will be fluff nodes, and 90 will be stem nodes. Let's also assume that each fluff node fluffs 5 transactions each epoch (which is a fairly high estimate I would imagine).

If a fluffing node fluffs its own transaction, an attacker will know that "every 10% of the time, the anonymity set of a node's own transaction will be 1 in 5". If they manage to figure out which one that was, then they would also reduce the anonymity set of the other transactions being fluffed to 1 in 4. This might sound like little additional information to give up, but if you monitor a network over time and build up a lot of data, this will have an impact on your model's prediction accuracy.

On the other hand, if a node always stems their own transaction regardless of whether it's in fluff or stem mode, the attacker will know that "every 100% of the time, the anonymity set of a node's own transaction will be 1 in 90". Which would make it much harder to pinpoint the originating source, as you're hiding amongst all the other stem nodes.

@antiochp antiochp added the release notes To be included in release notes (of relevant milestone). label Jun 5, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement release notes To be included in release notes (of relevant milestone).
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants