Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: arrow method validation #6036

Draft
wants to merge 4 commits into
base: next
Choose a base branch
from

Conversation

pubmodmatt
Copy link

@pubmodmatt pubmodmatt commented Sep 21, 2024

Validation changes:

  • Implement a JSON Selection visitor that can visit each part of a selection, including arrow methods
  • Adds validation of arrow method names against the known list of public arrow methods
  • Adds validation of arrow method arity
  • Add a unit test that valid uses of arrow methods do not cause any validation errors
  • Add a unit test with use of a non-existent arrow method name
  • Add a unit test with use of an arrow method with incorrect arity

JSON Selection changes:

  • Makes some more of the JSON selection code pub(crate) so the validation code is able to visit all the places where an arrow method name can appear

Known Issues:

  • There is currently an issue with another validation that reports on fields that do not have an associated connector. This keeps track of an already_seen list as the selection is validated. However, if any other validation fails and returns early, this validation will not see any remaining fields and incorrectly reports errors. This will need to be addressed separately, but the result can be seen in the test snapshots for this PR.
  • The validation code requires significant knowledge about arrow methods, such as their names and arity (and in the future, type information and how it applies to both the HTTP response and GraphQL schema). This is currently just hard coded. There will need to be some mechanism to provide this information, ideally one that is extensible as new arrow methods are added.

Checklist

Complete the checklist (and note appropriate exceptions) before the PR is marked ready-for-review.

  • Changes are compatible1
  • Documentation2 completed
  • Performance impact assessed and acceptable
  • Tests added and passing3
    • Unit Tests
    • Integration Tests
    • Manual Tests

Exceptions

Note any exceptions here

Notes

Footnotes

  1. It may be appropriate to bring upcoming changes to the attention of other (impacted) groups. Please endeavour to do this before seeking PR approval. The mechanism for doing this will vary considerably, so use your judgement as to how and when to do this.

  2. Configuration is an important part of many changes. Where applicable please try to document configuration examples.

  3. Tick whichever testing boxes are applicable. If you are adding Manual Tests, please document the manual testing (extensively) in the Exceptions.

Copy link
Member

@benjamn benjamn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this PR is still a draft, but I wanted to give some feedback early enough for you to handle it easily.

Broadly speaking, I'm finding it's difficult to defend abstraction boundaries in a Rust codebase, to prevent coupling and allow for implementation details to evolve in (relative) isolation. Tools like keeping PathList as pub(super) are sometimes all we have, so please don't change visibility without discussion.

Comment on lines +90 to +104
fn try_get_group_for_field(
&self,
field: &SelectionPart<'schema>,
) -> Result<Option<SelectionGroup<'schema>>, Self::Error> {
// Leaf nodes should return [`SelectionGroup::Empty`] rather than `None` to ensure that
// `exit_group` is called on the empty group, which in turn exits the visitor.
let field = field.clone();
let result = Ok(match field {
SelectionPart::JSONSelection(json_selection) => match json_selection {
JSONSelection::Named(sub_selection) => Some(SelectionGroup::new(
field,
vec![SelectionPart::SubSelection(sub_selection)],
)),
JSONSelection::Path(path_selection) => Some(SelectionGroup::new(
field,
vec![SelectionPart::PathList(&path_selection.path)],
)),
},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code (and what follows) seems to be reaching into and depending on the details of the JSONSelection AST structures. This is worrisome because those structures are actively changing and not at all stable (think of them as an implementation detail, not an interface to interact with directly). To my mind, that means code like this should live within the json_selection directory, at least. Can we find a better way of organizing this code, to avoid the sprawl?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The plan was to move this visitor into the JSON Selection module once we align on what this visitor looks like. I just put it here temporarily. I also think @nicholascioli also has some work going on in this area that might supersede this.

Comment on lines 205 to 237
// TODO: validation requires significant knowledge about arrow methods - need a better mechanism
// to provide it
lazy_static! {
static ref ARROW_METHODS: IndexMap<&'static str, Option<usize>> = {
let mut arrow_methods = IndexMap::default();
arrow_methods.insert("echo", Some(1));
arrow_methods.insert("map", Some(1));
arrow_methods.insert("match", None);
arrow_methods.insert("first", Some(0));
arrow_methods.insert("last", Some(0));
arrow_methods.insert("slice", None);
arrow_methods.insert("size", Some(0));
arrow_methods.insert("entries", Some(0));
arrow_methods
};
}
Copy link
Member

@benjamn benjamn Sep 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is another example of code that should, at the very least, reside nearer to the implementations of the arrow methods.

I'm also not entirely comfortable saying that methods like ->slice have unknown arity, since the arity is specifically required to be between 0 and 2 arguments (all integers). If we're going to be validating this, it seems worthwhile to distinguish a case like this from (e.g.) the ->match case, which needs at least two arguments.

More generally, though, I don't think arity really covers what we want to check about method parameters. We need a way of validating the sequence of types of the possible parameters, not just a check on how many are provided. But that can wait for future PRs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, could we call this static map something like ARROW_METHOD_ARITIES rather than just ARROW_METHODS (since that's the name of the actual map of arrow methods we use elsewhere)?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Entirely agree - as indicated in the TODO comment here, this is just a dummy placeholder for some better mechanism, and the discussion I want to have is what that better mechanism should look like. The arity is just meant as a simple example of something we could validate, but the real validation will need to look at type information among other things.

There are a couple of ways I could see this going:

  1. Expose enough metadata about arrow methods that external components can understand how to handle them. Making that metadata rich enough to describe what is needed would be challenging. Consider the match method - it requires knowing that the arguments are arrays of 2 items, where the first item type needs to match the input type and the second item type needs to match the output type.

  2. Have each arrow method be responsible for implementing a set of traits to do the things required by external components. So there could be a trait for validation another to provide code completion, etc. This co-locates these implementations with the arrow method implementations, but it creates some coupling - the JSON Selection code has to be aware of external things like validation, code completion, etc. and implement those for each arrow method.

Comment on lines 339 to +376
#[derive(Debug, PartialEq, Eq, Clone)]
pub(super) enum PathList {
pub(crate) enum PathList {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This enum is not intended to be public, since the construction of a PathList should only ever be performed through parsing, to enforce invariants on the structure of the list. Please find another way of handling this, such as using the PathSelection struct instead, and (if necessary) adding pub(crate) methods to PathSelection that delegate to PathList methods.

Comment on lines 54 to 65
enum SelectionGroup<'schema> {
Root {
children: Vec<SelectionPart<'schema>>,
},
Child {
parent: SelectionPart<'schema>,
children: Vec<SelectionPart<'schema>>,
},
Empty {
parent: SelectionPart<'schema>,
},
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be simplified to avoid the Root/Child distinction by having a single parent: Option<SelectionPart<'schema>> that would be None in the Root case?

It also seems like Empty is just the case where children is empty. If that's true, I don't think this type benefits much from being an enum with three different cases, when it could be a single struct.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Empty one was just vestigial and related to something I abandoned - I'll remove that. The Root vs Child is a way to use Rust type system to ensure correctness of data. In other languages, you might have a parent that is Optional or can be null. This allows creation of instances that are incorrect. For example, you could have a node return a child that has its parent set to Empty or null, which would be invalid. Every bit of code that accesses the parent field then needs to check for this and somehow handle it to avoid hitting a NullPointerExdeption or the like - but there isn't really anything sensible they can do if they encounter it. In Rust, we can avoid this situation entirely by easily creating type variants that enforce correctness of data. A Root is the only node that has no parent, and everything else requires a parent.

Comment on lines 265 to +298
pub struct PathSelection {
pub(super) path: WithRange<PathList>,
pub(crate) path: WithRange<PathList>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I know, the only way to keep the PathList private in Rust is to avoid opening up its visibility like this. Please reconsider this change.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be ways to prevent external construction of the PathList type, but they may or may not be what you want here (moving JSON Selection to its own crate and marking the type as non_exhaustive is one, or adding fields with private types is another).

I'm likely missing something, but I'm not seeing how to proceed if this type is not exposed. The only place methods and method arguments show up is in PathList::Method. So if we need to visit methods and arguments in a visitor, we'll need to access that, or maybe have some new type for the visitor that contains the same data (though that seems like duplication).

Comment on lines 1022 to +1036
pub struct MethodArgs {
pub(super) args: Vec<WithRange<LitExpr>>,
pub(crate) args: Vec<WithRange<LitExpr>>,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see an immediate problem here (compared to PathList), but I would note that the WithRange representation was relatively recently added and probably should not be considered stable yet.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with it being unstable and the validation code needing to change. There is significant value here for customers if we can get something in relatively soon - being able to point customers to a specific range associated with a validation error (such as an arrow method) is a way better experience that just highlighting the entire selection.

Comment on lines +26 to +35
/// A part of a JSON Selection to be visited
#[derive(Clone, Debug)]
pub(super) enum SelectionPart<'schema> {
JSONSelection(&'schema JSONSelection),
LitExpr(&'schema LitExpr),
MethodArgs(&'schema MethodArgs),
NamedSelection(&'schema NamedSelection),
PathList(&'schema PathList),
SubSelection(&'schema SubSelection),
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What qualifies a given AST structure as a part of the selection that should be visited, since this list is not exhaustive? I'd suggest adding more detail to the comment.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just that I needed it to locate arrow methods. This is really just the beginning of a proper AST visitor to allow validating arrow methods. I think @nicholascioli is working on designing something more extensive.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we decide that there should be a way to externally visit the JSON Selection AST (as opposed to implementing anything related to JSON Selection internally to the JSON Selection module), then it seems like we'd need to be able to visit every kind of node.

@pubmodmatt
Copy link
Author

I know this PR is still a draft, but I wanted to give some feedback early enough for you to handle it easily.

Broadly speaking, I'm finding it's difficult to defend abstraction boundaries in a Rust codebase, to prevent coupling and allow for implementation details to evolve in (relative) isolation. Tools like keeping PathList as pub(super) are sometimes all we have, so please don't change visibility without discussion.

@benjamn - this PR is specifically created to have that discussion. I was waiting to ask for your input until I have a few more changes in, but thanks for the early feedback! I'll add individual responses on the comments, and we can also meet to discuss. Implementing JSON Selection validation definitely brings up some important topics.

@pubmodmatt pubmodmatt force-pushed the pubmodmatt/connectors/arrow_method_validation branch from 57e0ae0 to fe83d25 Compare September 30, 2024 18:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants