Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: store equals for both content and equals multihash #23

Merged
merged 4 commits into from
Sep 19, 2023

Conversation

olizilla
Copy link
Contributor

We want to return equals claims when quired with either the content cid or the equals cid, so store 2 records in dynamo for equals claims.

assert/equals claims provide 2 CIDs. The content CID and the equals CID that we claim the content CID is equivelant to. (e.g. different hash fn for same bytes)

The dynamo records are { content: "base58btc-multihash", claim: CID }. The actual bytes of the claim are stored in s3, only once per claim.

For equals claims we store an additional record that sets the content to the multihash of the provided equals CID.

Alternatively we could have stored an equals property on dynamo records for equals claims, and added a global secondary index for that field, and updated the query to check both the content field and the equals field (and the secondary index) for every query, but this would incur additonal read costs for every query. It is assumed that storing a extra tuple of [string,cid] per equals claim will be cheap, and avoids a larger refactor of the claims db.

License: MIT

We want to return equals claims when quired with either the content cid or the equals cid, so store 2 records in dynamo for equals claims.

`assert/equals` claims provide 2 CIDs. The `content` CID and the `equals` CID that we claim the `content` CID is equivelant to. (e.g. different hash fn for same bytes)

The dynamo records are `{ content: "base58btc-multihash", claim: CID }`. The actual bytes of the claim are stored in s3, only once per claim.

For equals claims we store an additional record that sets the `content` to the multihash of the provided `equals` CID.

**Alternatively** we could have stored an `equals` property on dynamo records for equals claims, and added a global secondary index for that field, and updated the query to check both the content field and the equals field (and the secondary index) for every query, but this would incur additonal read costs for *every* query. It is assumed that storing a extra tuple of [string,cid] per equals claim will be cheap, and avoids a larger refactor of the claims db.

License: MIT
Signed-off-by: Oli Evans <oli@protocol.ai>
@olizilla
Copy link
Contributor Author

I'm looking at the walk api now to see how that needs to change to deal with bi-directional equals claims now.

@olizilla
Copy link
Contributor Author

Walk api is not critical to this work. We will be querying by content cid only. There is a more general problem of "the walk api needs to guard against loops" that should be tackled before we go live with that api. Issue raised #24

async put ({ claim, bytes, content, expiration }) {
const hasExpiration = expiration && isFinite(expiration)
const cidstr = claim.toString(base32)
async put (claim) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

factored out the s3 put and the db upsert so i can call upsertClaims twice for equals claims.


export interface Claim {
claim: Link
bytes: Uint8Array
content: MultihashDigest
expiration?: number
expiration?: number,
value: AnyAssertCap
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

passing the full capability object through to the claims store allows us to inspect the capability name, e.g. assert/equals and special case it, and we also get the nb property on the capability, so we can extract the equals cid.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also it has a pleasing symmetry with the mutliformats block api which has the bytes and the decoded value of a block.

a CIDv0 can be equivalent to other CIDs, so fix the types

License: MIT
Signed-off-by: Oli Evans <oli@protocol.ai>
@@ -69,9 +69,9 @@ export interface RelationPartInclusion {
/** A claim that the same data is referred to by another CID and/or multihash */
export interface EqualsClaim extends ContentClaim<typeof Assert.equals.can> {
/** Any CID e.g a CAR CID */
readonly content: Link
readonly content: UnknownLink
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

type fix... a v0 CID can be equivalent to another CID, so the types should allow UnknownLink here.

Copy link
Contributor

@vasco-santos vasco-santos left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code changes look good to me! Thanks for putting it together

Defer the decision if this is desirable, or if to be desirable client should perform 2 calls instead fo @alanshaw as I am not familiar with the protocol details.

Out of scope, but it would be great to consider for this project to improve error handling in the same lines that we do in w3filecoin / ucanto, where we type and move around the potential errors, instead of relying on errors to be thrown

return dynamoClient.send(cmd)
}, {
minTimeout: 100,
onFailedAttempt: err => console.warn(`failed DynamoDB update for: ${mh}`, err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we keep the error as it was, or perhaps extend it with mh as well?

Perhaps would make easier to see where the error comes if it logs the cidstr like the other error

}
// also add the claim with the `equals` cid as the `content` cid,
// so we can provide the equivalent claim for look ups for either.
await upsertClaim({ ...claim, content: equivalent }, this.#table)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should add any metadata to this? Otherwise, might get complex if we in the future want to change something and not know what was added on the other way. Thoughts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm in favour of adding the capability name to the dynamo record, so we could in future we could let user ask for "just assert/equals or whichever flavour they want, and it has the side benefit of letting us reduce the number of records we'd have to explore to back out this choice, without adding a explicit "this is one of those reverse claims"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#26

@seed-deploy seed-deploy bot temporarily deployed to pr23 September 18, 2023 14:39 Inactive
@seed-deploy
Copy link

seed-deploy bot commented Sep 18, 2023

View stack outputs

Copy link
Member

@alanshaw alanshaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! I added a few non-blocking feedbacks.

packages/lambda/src/lib/store/index.js Outdated Show resolved Hide resolved
packages/lambda/src/lib/store/index.js Outdated Show resolved Hide resolved
packages/lambda/src/lib/store/index.js Outdated Show resolved Hide resolved
Co-authored-by: Alan Shaw <alan.shaw@protocol.ai>
@seed-deploy seed-deploy bot temporarily deployed to pr23 September 19, 2023 10:23 Inactive
License: MIT
Signed-off-by: Oli Evans <oli@protocol.ai>
@seed-deploy seed-deploy bot temporarily deployed to pr23 September 19, 2023 10:55 Inactive
@olizilla olizilla merged commit 715fcd5 into main Sep 19, 2023
3 checks passed
@olizilla olizilla deleted the return-equals-for-content branch September 19, 2023 11:10
olizilla pushed a commit that referenced this pull request Sep 19, 2023
🤖 I have created a release *beep* *boop*
---


##
[3.1.0](content-claims-v3.0.1...content-claims-v3.1.0)
(2023-09-19)


### Features

* add `assert/equals`
([#22](#22))
([bddd948](bddd948))
* ensure claim API is CID version agnostic
([#18](#18))
([1690c1f](1690c1f))
* store equals for both content and equals multihash
([#23](#23))
([715fcd5](715fcd5))


### Bug Fixes

* key content on multihash not CID
([#21](#21))
([7e737a7](7e737a7))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants