Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CIP 100 | Provide directions on how to create signatures for the body without circular dependencies. #783

Closed
kderme opened this issue Mar 14, 2024 · 21 comments

Comments

@kderme
Copy link
Contributor

kderme commented Mar 14, 2024

CIP 100 provides an algorithm to find and hash the body of governance metadata

Canonicalize the whole document according to [this](https://w3c-ccg.github.io/rdf-dataset-canonicalization/spec/) specification.
Identify the node-ID of the body node
Filter the canonicalized document to include the body node, and all its descendents
Ensure the file ends in a newline
Hash the resulting file with blake2b-256

However this assumes that in order to hash the body (last step), you first need to canonicalize the whole document (first step). This creates a circular dependencies problem, since the whole document already contains the signatures.

@Ryun1
Copy link
Collaborator

Ryun1 commented Mar 14, 2024

This creates a circular dependencies problem, since the whole document already contains the signatures.

Agreed this isnt clear, I will improve in my PR #782 .

But I dont think there is a dependency;
Since you filter out the signatures via; Filter the canonicalized document to include the body node, and all its descendents
you can just have placeholders when you Canonicalize the whole document

@kderme
Copy link
Contributor Author

kderme commented Mar 14, 2024

The canonicalization algorithm depends in some way in sorting the generated N-Quads and also in the number of these N-Quads. Without knowing at least the context of the whole text it's not possible to have placeholders, because it's impossible to know keys that can be sorted before or after the body.

I must say my understanding of the algorithm is limited to reach safe conclusions.

@Quantumplation
Copy link
Contributor

You know all of the keys ahead of time, so you should be able to generate all of the N-Quads. You just don't know the values for those keys.

@kderme
Copy link
Contributor Author

kderme commented Mar 14, 2024

You know all of the keys ahead of time

So each author is expected to know the total number of authors and other information to predict the correct node-id of the body? It's a bit counter-intuitive that data that live outside the body actually affect the normal form of the body and as a result the signature.

@Quantumplation
Copy link
Contributor

If you have an alternative suggestion, this is one of the things we struggled with, there's not really a good solution; just taking the body for canonicalization doesn't quite work because it doesn't have the context, so your have to modify the body to inject a context, etc.

@kderme
Copy link
Contributor Author

kderme commented Apr 22, 2024

Sorry for the late response, my suggestion would be to sign the body content as is, or follow the much simpler json canonicalisation algorithm. The json-ld canonicalisation algorithm has no protection over colision attacks, so even after fixing the circular logic problem, it's always possible for 2 different json-ld files to have the same canonical form. For example all 3 texts below have the same canonized format

initial.txt

modified 1.txt

modified 2.txt

but they're clearly not what the initial signers wanted (note the injected comments). This puts into question the whole mechanism of signature and anchor hash validation and breaks the "humanly readable" property.

@kderme
Copy link
Contributor Author

kderme commented Apr 26, 2024

IMO json-ld could be an optional extention for metadata creators that want to follow the rdf/semantic web or tools that want to do rdf style queries and its usage should not be forced, especially for the validation algorithm.

@Quantumplation
Copy link
Contributor

Usage isn't forced; there's no way the ledger can enforce that.

This is a metadata standard, but actual users are free to do whatever they want.

@kderme
Copy link
Contributor Author

kderme commented Apr 26, 2024

Right the ledger cannot enforce it, but the CIP-100 specification requires it.

@Quantumplation
Copy link
Contributor

Yes; to be compliant with the CIP you have to comply with the CIP. But nothing requires that people be compliant with the CIP. It just makes it easier for tooling developers to index the metadata.

If the CIP was just "any json document", it wouldn't provide any value to those building tools. On the other end of the spectrum, if it was incredibly prescriptive, it'd lead to lots of churn and argumentation over what should go into the CIP. So it tries to strike a balance for an incredibly nascent and exploratory ecosystem.

@kderme
Copy link
Contributor Author

kderme commented Apr 29, 2024

I agree with most parts of cip-100, except the validation algorithms. CIP-1694 defines

An anchor is a pair of:

a URL to a JSON payload of metadata
a hash of the contents of the metadata URL

CIP-100 instead of a well established hash function with good properties, uses the composition of rdf canonicalisation plus blake2b-256. This not only is not collision resistant, but it allows to construct malicious content with the same hash, as shown at #783 (comment). Imho using a simpler hash directly, like blake2b-256, would increase its adoption.

@Quantumplation
Copy link
Contributor

Quantumplation commented May 4, 2024

@kderme I'm totally fine to change the standard; I think the idea behind the canonicalization was to make it easier to validate; for example, some languages don't preserve the order of keys, newlines, etc. The whole canonicalization debate on the ledger cbor, where you have to carry around the original bytes, has been a nightmare, so we were trying to avoid that. And since this metadata isn't driving any logic (i.e. it's ultimately just for human consumption and display, not powering any ledger decisions etc.) the collision resistance wasn't deemed to be a big deal at the time.

That being said, perhaps it is simpler to just say "the raw bytes you receive over the wire are what get hashed, end of story". I'd be totally fine with that change.

@Ryun1 @scarmuega @KtorZ Any particular thoughts?

@Quantumplation
Copy link
Contributor

Also, that's interesting; CIP-1694 shouldn't be defining the content type of the metadata (i.e. it should just say a URL to the metadata payload). Since it can't enforce anything anyway, and we want to leave it flexible to other formats.

@kderme
Copy link
Contributor Author

kderme commented May 20, 2024

Also, that's interesting; CIP-1694 shouldn't be defining the content type of the metadata (i.e. it should just say a URL to the metadata payload). Since it can't enforce anything anyway, and we want to leave it flexible to other formats.

I agree with that.

the raw bytes you receive over the wire are what get hashed

This has the extra benefit that tools can validate and serve the data without having to parse them.

The whole canonicalization debate on the ledger cbor, where you have to carry around the original bytes, has been a nightmare, so we were trying to avoid that.

Since with the above design parsing is not necessary for the full metadata, this won't be an issue. However for the "body" part which needs to be parsed and hashed separately it can be and a canonicalisation algorithm may be a good idea. Imo https://www.rfc-editor.org/rfc/rfc8785 is a better and more established candidate.

@kderme
Copy link
Contributor Author

kderme commented May 20, 2024

I'd be happy to prepare a pr for CIP-100 with the above.

@rphair
Copy link
Collaborator

rphair commented Jun 25, 2024

@kderme would #835 close this issue? To help this along I'm including it on the CIP agenda tonight (https://hackmd.io/@cip-editors/91), since we're pressed for a decision on that PR and should have related stakeholders present.

@rphair
Copy link
Collaborator

rphair commented Jun 26, 2024

p.s. to #783 (comment) - @kderme now that #835 has been merged, if the remaining issue(s) look any different than you originally described in your OP then please update here so we can stay properly focused. cc @Crypto2099 @disassembler

@kderme
Copy link
Contributor Author

kderme commented Jun 28, 2024

IMO a remaining piece for this ticket is replacing URDNA2015 with a simpler canonicalisation algorithm such as https://www.rfc-editor.org/rfc/rfc8785 or https://wiki.laptop.org/go/Canonical_JSON for the canonicalisation of the body, which is signed by the authors (this doesn't apply to CIP-119, only to CIP-108 and possibly future extensions of CIP-100).

For the body signing, a canonicalisation algorithm is necessary, however I firmly believe that rfc8785 or Canonical_JSON are much better and simpler candidates. Practically they canonicalise a json structure by removing whitespaces and ordering the keys. They suffer much less by collisions.

Speaking as the maintainer of db-sync, we have a full implementation for these CIPs except for the signature validation and it would be a big pain supporting URDNA2015. We probably won't do any signature validations if it remains as is. I'd be happy to open a pr with these chagnes the next days.

@gitmachtl
Copy link
Contributor

gitmachtl commented Aug 7, 2024

I am linking this post here from the Intersect Discord Server, because we stumbled about it while testing Koios/SPO-Scripts:
https://discord.com/channels/1136727663583698984/1239888910537064468/1270651361570193430

cardano-signer 1.17.0 now also supports the canonized hashing of the @context+body content for further signing of the document authors.

@kderme kderme closed this as completed Sep 6, 2024
@kderme kderme reopened this Sep 6, 2024
@gitmachtl
Copy link
Contributor

Short Update: cardano-signer 1.19.0 can also directly sign jsonld gonvernace metadata files.

@kderme
Copy link
Contributor Author

kderme commented Sep 24, 2024

It seems that many tools already support the author canonicalisation as mentioned in CIP-100 and given the improvements made to it, I think this can be close.

@kderme kderme closed this as completed Sep 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants