Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

checklist of 3.6.0 PRs that will cause FORK #2910

Closed
5 tasks
vang1ong7ang opened this issue Sep 16, 2023 · 41 comments
Closed
5 tasks

checklist of 3.6.0 PRs that will cause FORK #2910

vang1ong7ang opened this issue Sep 16, 2023 · 41 comments

Comments

@vang1ong7ang
Copy link
Contributor

vang1ong7ang commented Sep 16, 2023

hope all the FORK points can be resolved before the upgrade

by submitting a transaction with script FORKTRIGGER(); NEO.transfer(self, victim, amount) inside when v3.5.0, the attacker will get his money back when v3.6.0.

ALL the PRs listed above should not be blamed, since looking at these pull requests individually, there is no problem with their implementation. while in the context of upgrading from v3.5.0, more historical baggage is introduced. maintainers should create a new PR to handle all possible fork points to ensure mainnet state consistency.

I'm fine if the final decision is to ignore the fork risks and proceed to upgrade to v3.6.0. we will still try to monitor, warn and prevent possible forks as much as possible to keep the mainnet safe though I don't think we should let the mainnet take the risk.

hardfork PRs should learn from #2810

if (!IsHardforkEnabled(Hardfork.HF_Basilisk))
{
    RuntimeNotifyV1(eventName, state);
    return;
}
@dusmart
Copy link

dusmart commented Sep 16, 2023

(Optional) Additional context

  • there is a version-control method for native contract upgrade
  • there is also a hard-fork tag for upgrade

Based on NEO's current situation, I also understand you if the final decision is to ignore the fork risks and proceed to upgrade to v3.6.0.

@roman-khimov
Copy link
Contributor

#2849 shouldn't be an issue, its behavior is controlled via the hardfork (see #2881). But I agree that many other can be problematic. Unfortunately, we have an unwritten convention of relying on resyncs (at least since #2560) and to be fair it even works most of the time. Anyway, using proper upgrade mechanisms would make Neo a much better place, especially since we have them (both native activations and hardforks, but likely HFs are easier to use and can cover any situation).

@roman-khimov
Copy link
Contributor

I'll also add that the scenario of "submitting a transaction with script FORKTRIGGER(); NEO.transfer(self, victim, amount) inside when v3.5.0, the attacker will get his money back when v3.6.0." is not always possible. If some PR is only adding some new native contract API then any attempt to call it will make transaction FAIL miserably. This means that after the upgrade it can do something instead of failing, but it's a bit harder to leverage that (not impossible though).

@dusmart
Copy link

dusmart commented Sep 16, 2023

I'll also add that the scenario of "submitting a transaction with script FORKTRIGGER(); NEO.transfer(self, victim, amount) inside when v3.5.0, the attacker will get his money back when v3.6.0." is not always possible.

They are basically the same. You can send a transaction first, and then use the second to check it's state in FORKTRIGGER().

Any change including add, delete and modify should all be treated seriously.

@vang1ong7ang
Copy link
Contributor Author

I'll also add that the scenario of "submitting a transaction with script FORKTRIGGER(); NEO.transfer(self, victim, amount) inside when v3.5.0, the attacker will get his money back when v3.6.0." is not always possible. If some PR is only adding some new native contract API then any attempt to call it will make transaction FAIL miserably. This means that after the upgrade it can do something instead of failing, but it's a bit harder to leverage that (not impossible though).

it's not right.

by calling Ledger.GetTransactionState, the attackers can still succeed.

@dusmart
Copy link

dusmart commented Sep 16, 2023

by calling Ledger.GetTransactionState, the attackers can still succeed.

I also figure out a new method to do the same if there is no such a native contract. If one tx in 3.5.0 will fail, you can add GAS.burn(GAS.balance(sender)) in it. And then send a second tx that will cost some GAS. When it comes to 3.6.0, the second will be an invalid tx and stuck the upgrade process.

@vncoelho
Copy link
Member

vncoelho commented Sep 16, 2023

In fact, @igormcoelho and I have been in favor of not changing tx state after any modification.

That was the reason we created that simple plugin on neo-modules for neo2x and it is still used nowdays.

We need to preserve states, MPT or Graphs are mostly for this purpose of having a solid compact state representation.

A fork tag can change the behavior just for the next events. I do not see much options instead of a general tag and ensurance of past states.
As we have observed in other protocols, we do not need witness itself, we just need the result of that verification at that time. And, for sure, that should never change.

@vncoelho
Copy link
Member

My suggestion is to:

  • Abolish resync during upgrades.

    • We need continous compatibility;
  • Create a general fork tag.

    • include client version on local storage in a way that in case of resync we can use exactly the version used at the time of data creation.

Surely there could be options to flexibilize that, maybe we have a patch that makes sync faster and it is compatible.

@dusmart
Copy link

dusmart commented Sep 16, 2023

  • Abolish resync during upgrades.

Actually there are some PRs rely on resync heavily. The PR for optimizing the vote reward data requires a full resync to help future reward calculating.

My advice is to drop these PRs from 3.6.0 if we need continuous compatibility. For #2841, there are only about 4GB data for a normal node. I don't see any significant reason to do it and make the upgrade risky. If you have ever run a ethereum node or some others, you will know NEO is far more better on that.

@Jim8y
Copy link
Contributor

Jim8y commented Sep 16, 2023

What resync actually means? Reexecution or re-downloading everything?@dusmart

@dusmart
Copy link

dusmart commented Sep 16, 2023

What resync actually means? Reexecution or re-downloading everything?@dusmart

I mean re-creating the previous state here based on whatever makes it happen. It makes the MTP useless on checking whether the upgrade is abused or not.

@vncoelho
Copy link
Member

I fully agree @dusmart.
That enters in the case when full resync is just recommended if there is a relevant improvement on the sync process and data.
4GB is not really relevant now.

@dusmart
Copy link

dusmart commented Sep 16, 2023

BTW, Vang has done a POC that makes it even harder to identify whether a TX abuses the upgrade or not. I believe advanced patten matching could also fail here if you've seen that OpCodes.

If the previous state is kept, then a simple MTP state root check will be very helpful for identifying it.

@vncoelho
Copy link
Member

vncoelho commented Sep 16, 2023

BTW, Vang has done a POC that makes it even harder to identify whether a TX abuses the upgrade or not. I believe advanced patten matching could also fail here if you've seen that OpCodes.

If the previous state is kept, then a simple MTP state root check will be very helpful for identifying it.

This guy has no limits...ahahaha

The state preservation is the correct way to go.
For now, revert those that require resync if they are not critical or with a considerable relevance to optimization.

I can push these PRs next week if you want, but I think you guys from NGD are more expert on that, count on us for support and review in any case.

@vang1ong7ang
Copy link
Contributor Author

The state preservation is the correct way to go.

I fully agree with that. it finally ensures the block finality feature and provides necessary guarantees to applications and exchanges.

@dusmart
Copy link

dusmart commented Sep 16, 2023

This guy has no limits...ahahaha

Ahahaha, if you notice his simple POC here in detail you will find that he use NEO.transfer instead of GAS.transfer. NEO contract's state would have already been changed by the #2841. Therefore a simple change to the MTP plugin (that bypass the NEO contracts' state) can not identify it, either.

If they still push it on mainnet and want to make sure the success, they will have to bypass all the state changes made by the new PRs and calculate the state root, then compare it with the old one. I'm not sure whether someone has had a plan on doing it. Or just want to pretend that there should have no hacker making use of it. Sorry that I made a malicious assumptions.

@vang1ong7ang
Copy link
Contributor Author

Unfortunately, we have an unwritten convention of relying on resyncs (at least since #2560) and to be fair it even works most of the time.

Originally posted by @roman-khimov in #2910 (comment)

please don’t take chances and put mainnet users at risk.

Having sex without a condom for several years without getting pregnant is not something worth showing off.

@roman-khimov
Copy link
Contributor

@vang1ong7ang, @dusmart, I'm explaining the status quo and the way it came to be. These problems are known, they're not new. There are bad things with the current approach, there are good ones as well. The topic of state finality had been beaten to death (literally, as in state finality death) in #1526.

@dusmart
Copy link

dusmart commented Sep 16, 2023

There are bad things with the current approach, there are good ones as well. The topic of state finality had been beaten to death (literally, as in state finality death) in #1526.

Thanks always for lots of links. They really help. Great to see the context.

Then I think that's what we should pay more attention to instead of optimizing the storage (which also bring dAPPs relying on old interfaces a bad experience).

I doubt that someone has exploited it in a small amount. Could you please share us more of the details on how NEO check the upgrade's safety before?

@roman-khimov
Copy link
Contributor

that's what we should pay more attention to

Likely so. Some Neo implementations even have StateRootInHeader extensions.

optimizing the storage (which also bring dAPPs relying on old interfaces a bad experience).

#2841 is trivial to fix keeping exact old behavior for NEO API. At least while we're operating in a "floating state" paradigm. And it solves #2815, which is about specific use cases for this data, not exactly about performance.

Could you please give us more of the details on how you check the upgrade's safety before?

I guess that's a bit out of my scope. But in general state changes (as in https://github.com/neo-project/neo-modules/tree/master/src/StorageDumper) can be compared, usually it's just the genesis that changes (new methods are added). #2841 is like the first real exception to this in two years, there NEO/GAS balances were compared directly IINM. And yes, of course this is racy (check at height N says nothing about height N+1, upgrades take time) and it's not just new methods, there can be new opcodes/flags/whatever.

@dusmart
Copy link

dusmart commented Sep 16, 2023

I guess that's a bit out of my scope. But in general state changes (as in https://github.com/neo-project/neo-modules/tree/master/src/StorageDumper) can be compared, usually it's just the genesis that changes (new methods are added).

If you haven’t pay attention to it, I think most of developers didn’t, either. We all count on others to do these trivial things.

Actually, I proposed to Vang that a simple abuse on mainnet should be done, sending NEP17 to poly network using the method mentioned in this issue to test if someone of NEO or Poly could catch it. The result will probably be bad.

He took his responsibility on opening an issue here to correct our upgrading method. Even if the previous one could draw more attention and win him more reputation. Even if he knows most of developers may have already known this issue and could ignore his effort.

@roman-khimov
Copy link
Contributor

roman-khimov commented Sep 16, 2023

If you haven’t pay attention to it

I'm paying a bit more attention to state matches between NeoGo/NeoC# (they usually match whatever they are and yes, we compare state changes the way described above (at least until new signed stateroots start floating around which NeoGo always catches and compares)) and NeoFS sidechains (they usually work and requirements there are somewhat different).

opening an issue here to correct our upgrading method

That's good. But it's a paradigm shift. So I'd prefer to do this post-3.6, at least to be done with some current set of changes.

Even if the previous one could draw more attention and win him more reputation.

Can't really comment on it. I wouldn't want that reputation.

@vang1ong7ang
Copy link
Contributor Author

vang1ong7ang commented Sep 17, 2023

I DO NOT WANT ANY KIND OF REPUTATION.

Objectively existing problems and facts should be given primary attention.

fork is far more harmful than deny of service.

To be honest, I don't have any hope that anyone will fix the problems mentioned in the current issue before v3.6.0. So we are improving network monitoring and some other emergency response plans to reduce losses when extreme situations occur.

@vang1ong7ang
Copy link
Contributor Author

vang1ong7ang commented Sep 17, 2023

I observed both neoc# and neogo lack rigorous status checking when upgrading. Although I don't know much about neogo, I know that its behavior is different from the c# version in many implementation details. This difference in implementation can easily lead to inconsistency between the two states, which is a dangerous thing on the blockchain (1. So far, no one has deliberately used this phenomenon to create problems. 2. The absolute mainstream of c# impl in the neo ecosystem alleviates covers the severity of this problem).

@vang1ong7ang
Copy link
Contributor Author

vang1ong7ang commented Sep 17, 2023

@vang1ong7ang, @dusmart, I'm explaining the status quo and the way it came to be. These problems are known, they're not new.

I don't think it is the most responsible way if you have already known these problems but still decided (or tacitly allow) to upgrade to 3.6.0 without any protection.

There are bad things with the current approach, there are good ones as well.

of course, everything has both sides, even the deny of service bug has good things, at least, it's one of the most efficient way to pause the network and reduce loss spread if attackers make use of the vulnerability mentioned in this issue.

optimizing the storage (which also bring dAPPs relying on old interfaces a bad experience).

#2841 is trivial to fix keeping exact old behavior for NEO API. At least while we're operating in a "floating state" paradigm. And it solves #2815, which is about specific use cases for this data, not exactly about performance.

  • I don't see the additional benefit Optimize vote reward data #2841 brings other than performance optimization.
  • I don't think neo was intended to be in a "floating state" paradigm. If you read the discussion of How to sign the stateroot? #1526 carefully, you will find that the "floating state" paradigm is not intended to be abused as it is now, it just provides the ability to repair the original error state in extreme cases.

@roman-khimov
Copy link
Contributor

The absolute mainstream of c# impl in the neo ecosystem alleviates the severity of this problem

It's not. It just:

  • sweeps under the rug the fact that we're lacking a complete behavior specification
  • de-facto standardizes many of C# (dotNET) behavior specifics that make it really hard to have a 100% compatible independent implementation in any other language (once upon a time we even had a neo-python implementation. No, it didn't survive)
  • creates a monoculture that can also be targeted more easily (not linking C#-specific bugs now), a more diverse environment wouldn't allow some things to happen (and yes, via a DOS)

decided (or tacitly allow) to upgrade to 3.6.0 without any protection

Please, check 3.1.0, 3.2.0, 3.3.0, 3.4.0, 3.5.0 changes.

it just provides the ability to repair the original error state in extreme cases

Yeah, sure. But once you have it's soooo convenient to use it for other purposes. Good ones, btw. And so it's used this way. At least for the past two years. With every minor release (and sometimes a bit more often).

@vang1ong7ang
Copy link
Contributor Author

vang1ong7ang commented Sep 17, 2023

decided (or tacitly allow) to upgrade to 3.6.0 without any protection

Please, check 3.1.0, 3.2.0, 3.3.0, 3.4.0, 3.5.0 changes.

that's what "tacitly" means.

every time i was expecting it should be the last "floating state" fork, even till the basilisk was introduced.

@vncoelho
Copy link
Member

We must not upgrade to 3.6.0, this is not correct to put mainnet state at risk.

Let's revert the ones that need resync, like voting reward. We, then, solve the others with tags.

The voting reward could be merged with a tag as well, and we start to improve it from now.
Later, we do a tentative to resync past states in a safe manner in order to optimize clients' data.

@roman-khimov
Copy link
Contributor

I'm sure we don't need any reverts. In almost all cases (probably with the only exception of #2892) API changes can be tied to the Basilisk hardfork. Of those I'd say NeoToken.GetAccountState and NeoToken.UnclaimedGAS are the most problematic. We need 3.6.1 to fix #2907 anyway, so we can think of basiliskifying our code a bit more (it doesn't take a lot of time either, can be rolled out next week).

@dusmart
Copy link

dusmart commented Sep 18, 2023

First of all. I want apologies to core developers whose work are listed in this checklist. I want to be absolutely clear that I fully comprehend and firmly stand behind your tireless efforts.

Since this issue's name is FORK instead of HARD FORK, I think we should add #2818 to the list cause it's a soft fork.

Soft fork will not cause a real fork on the mainnet and do not have that large damage. But it will bring obstacles to nodes that will be upgraded a little later.

For soft fork, it will also be better to activate it after the new fork tag. Then we can give all nodes (wallet's, CEX's, crosschain's, dAPP's ...) a few more time to do a leisurely upgrade.

@vncoelho
Copy link
Member

image

image

@dusmart @vang1ong7ang

Now it can be replicated with NeoCompiler Eco in a kind of a simple manner.
Here you can see two clients with different balances. One was re-sync with 3.6.0.

@vang1ong7ang
Copy link
Contributor Author

vang1ong7ang commented Sep 21, 2023

Mainnet is at 3.6 now, which makes 3.6-specific part of #2910 irrelevant (not the underlying problem, of course), removed from the list.

Originally posted by @roman-khimov in #2914 (comment)

cheers :)

this is not the most responsible way. let's exercise some judgment, please.


there are 3 blockchain explorers i know on neo network which is https://neotube.io/, https://explorer.onegate.space/ and https://dora.coz.io/

after sending the upgrade detection transaction on block 4125581 https://neotube.io/transaction/0x69fa0b479e4d318e49640f240664c20fafe87005211818b8b6e08b80294a698a

two of the only there explorers https://explorer.onegate.space/ and https://dora.coz.io/ blocked at block 4125581.

THE TRANSACTION IS A LEGAL TRANSACTION WITHOUT ANY DAMAGE

I apologize to all users and projects affected by this transaction but do understand why I have to do this:

  • the state conflict issue between v3.5.0 and v3.6.0 is public and may be used by any hackers
  • by pausing the outdated nodes, those nodes won't be cheated by those malicious transactions. otherwise all the service with the outdated neo nodes may show false information
  • to my surprise, there are so many nodes that have not yet upgraded to v3.6.0, even the blockchain explorers. it would be guilty if users are misled due to incorrect information displayed on the blockchain explorers.
  • nodes updated to v3.6.0 are not affected by this transacction
  • the transaction is legal and harmless. it is a normal transaction without deliberate construction
  • if someone can reach their maintainers, tell them to upgrade to v3.6.0 as soon as possible to recover the service

apologize again for the sudden operation.

@roman-khimov
Copy link
Contributor

cheers :)

That's just a fact and me trying to keep track of at least some patch version. You can blame me for 1/7 of this fact (according to https://governance.neo.org/), if you want to. But 1/7th is not enough to change facts. And not following the scheduled upgrade will soon result in consensus problems. Go figure.

@vang1ong7ang
Copy link
Contributor Author

vang1ong7ang commented Sep 21, 2023

  • urgent patches should be deployed earlier, rather than waiting until 3.6.0
  • your behavior of upgrading the node before fix is inconsistent with what you have said on the current page, without any announcement or warning
  • you don’t have to justify it, and you don’t have to put the blame on others. this is the problem faced by you, me and all our neo developers, maintainers and all communities. take the 1/7, 1/21 or 1/1024 whatever

just as @shargon said:

YOU SHOULD BE CLEAR THAT IT COULD LEAD TO A FORK OR WORSE STILL, THAT A THIRD PARTY ABUSES SAID ATTACK AND YOU ARE DIRECTLY OR INDIRECTLY RESPONSIBLE FOR ITS EXPLOITATION.

remember it is mainnet which should be given 1000x attention to than testnet

@Jim8y
Copy link
Contributor

Jim8y commented Sep 21, 2023

All I can say is thank god we have @vang1ong7ang here on our side.

@dusmart
Copy link

dusmart commented Sep 21, 2023

by pausing the outdated nodes, those nodes won't be cheated by those malicious transactions. otherwise all the service with the outdated neo nodes may show false information

Genius. I thought that we will have to compare the token states all the way to 3.6.1 to safeguard the chain. Using soft fork to protect us with a few inconvenience is totally reasonable to me.

MAY THE SAFETY BE ALWAYS WITH US

@vncoelho
Copy link
Member

vncoelho commented Sep 21, 2023

For me it is unbelievable that CNs were upgrade.
@vang1ong7ang warned and I replicated the attack.
This is irresponsibility if there is any state that has been affected. I imagine that you did a complete verification before upgrading the node.

My friend, @vang1ong7ang, were you able to verify that? @superboyiii

@vang1ong7ang
Copy link
Contributor Author

I confirmed with @superboyiii that the state verification is done, though it cannot completely ensure the safety, it is a good and important protective measurement.

however unfortunately I didn't do the verification yet.

@Jim8y
Copy link
Contributor

Jim8y commented Oct 10, 2023

@vang1ong7ang @vncoelho @dusmart I have a question, when a new block is generated and my transaction is included, how do i know whether a node has the correct execution result. How does a node know that he has the correct execution results that have the same states with other nodes. For neo of course

@dusmart
Copy link

dusmart commented Oct 10, 2023

@roman-khimov
Copy link
Contributor

roman-khimov commented Nov 14, 2023

I propose closing this because 3.6.0 is already an old story, the specific list is outdated. For the future versions the real underlying problem will be fixed by #1526 (and associated technologies like #2932).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants