WIP: arrow method validation #6036

pubmodmatt · 2024-09-21T19:11:41Z

Validation changes:

Implement a JSON Selection visitor that can visit each part of a selection, including arrow methods
Adds validation of arrow method names against the known list of public arrow methods
Adds validation of arrow method arity
Add a unit test that valid uses of arrow methods do not cause any validation errors
Add a unit test with use of a non-existent arrow method name
Add a unit test with use of an arrow method with incorrect arity

JSON Selection changes:

Makes some more of the JSON selection code pub(crate) so the validation code is able to visit all the places where an arrow method name can appear

Known Issues:

There is currently an issue with another validation that reports on fields that do not have an associated connector. This keeps track of an already_seen list as the selection is validated. However, if any other validation fails and returns early, this validation will not see any remaining fields and incorrectly reports errors. This will need to be addressed separately, but the result can be seen in the test snapshots for this PR.
The validation code requires significant knowledge about arrow methods, such as their names and arity (and in the future, type information and how it applies to both the HTTP response and GraphQL schema). This is currently just hard coded. There will need to be some mechanism to provide this information, ideally one that is extensible as new arrow methods are added.

Checklist

Complete the checklist (and note appropriate exceptions) before the PR is marked ready-for-review.

Exceptions

Note any exceptions here

Notes

It may be appropriate to bring upcoming changes to the attention of other (impacted) groups. Please endeavour to do this before seeking PR approval. The mechanism for doing this will vary considerably, so use your judgement as to how and when to do this. ↩
Configuration is an important part of many changes. Where applicable please try to document configuration examples. ↩
Tick whichever testing boxes are applicable. If you are adding Manual Tests, please document the manual testing (extensively) in the Exceptions. ↩

benjamn

I know this PR is still a draft, but I wanted to give some feedback early enough for you to handle it easily.

Broadly speaking, I'm finding it's difficult to defend abstraction boundaries in a Rust codebase, to prevent coupling and allow for implementation details to evolve in (relative) isolation. Tools like keeping PathList as pub(super) are sometimes all we have, so please don't change visibility without discussion.

benjamn · 2024-09-26T14:48:12Z

apollo-federation/src/sources/connect/validation/selection/visitor.rs

+    fn try_get_group_for_field(
+        &self,
+        field: &SelectionPart<'schema>,
+    ) -> Result<Option<SelectionGroup<'schema>>, Self::Error> {
+        // Leaf nodes should return [`SelectionGroup::Empty`] rather than `None` to ensure that
+        // `exit_group` is called on the empty group, which in turn exits the visitor.
+        let field = field.clone();
+        let result = Ok(match field {
+            SelectionPart::JSONSelection(json_selection) => match json_selection {
+                JSONSelection::Named(sub_selection) => Some(SelectionGroup::new(
+                    field,
+                    vec![SelectionPart::SubSelection(sub_selection)],
+                )),
+                JSONSelection::Path(path_selection) => Some(SelectionGroup::new(
+                    field,
+                    vec![SelectionPart::PathList(&path_selection.path)],
+                )),
+            },


This code (and what follows) seems to be reaching into and depending on the details of the JSONSelection AST structures. This is worrisome because those structures are actively changing and not at all stable (think of them as an implementation detail, not an interface to interact with directly). To my mind, that means code like this should live within the json_selection directory, at least. Can we find a better way of organizing this code, to avoid the sprawl?

The plan was to move this visitor into the JSON Selection module once we align on what this visitor looks like. I just put it here temporarily. I also think @nicholascioli also has some work going on in this area that might supersede this.

benjamn · 2024-09-26T14:52:14Z

apollo-federation/src/sources/connect/validation/selection.rs

+// TODO: validation requires significant knowledge about arrow methods - need a better mechanism
+//   to provide it
+lazy_static! {
+    static ref ARROW_METHODS: IndexMap<&'static str, Option<usize>> = {
+        let mut arrow_methods = IndexMap::default();
+        arrow_methods.insert("echo", Some(1));
+        arrow_methods.insert("map", Some(1));
+        arrow_methods.insert("match", None);
+        arrow_methods.insert("first", Some(0));
+        arrow_methods.insert("last", Some(0));
+        arrow_methods.insert("slice", None);
+        arrow_methods.insert("size", Some(0));
+        arrow_methods.insert("entries", Some(0));
+        arrow_methods
+    };
+}


This is another example of code that should, at the very least, reside nearer to the implementations of the arrow methods.

I'm also not entirely comfortable saying that methods like ->slice have unknown arity, since the arity is specifically required to be between 0 and 2 arguments (all integers). If we're going to be validating this, it seems worthwhile to distinguish a case like this from (e.g.) the ->match case, which needs at least two arguments.

More generally, though, I don't think arity really covers what we want to check about method parameters. We need a way of validating the sequence of types of the possible parameters, not just a check on how many are provided. But that can wait for future PRs.

Also, could we call this static map something like ARROW_METHOD_ARITIES rather than just ARROW_METHODS (since that's the name of the actual map of arrow methods we use elsewhere)?

Entirely agree - as indicated in the TODO comment here, this is just a dummy placeholder for some better mechanism, and the discussion I want to have is what that better mechanism should look like. The arity is just meant as a simple example of something we could validate, but the real validation will need to look at type information among other things.

There are a couple of ways I could see this going:

Expose enough metadata about arrow methods that external components can understand how to handle them. Making that metadata rich enough to describe what is needed would be challenging. Consider the match method - it requires knowing that the arguments are arrays of 2 items, where the first item type needs to match the input type and the second item type needs to match the output type.

Have each arrow method be responsible for implementing a set of traits to do the things required by external components. So there could be a trait for validation another to provide code completion, etc. This co-locates these implementations with the arrow method implementations, but it creates some coupling - the JSON Selection code has to be aware of external things like validation, code completion, etc. and implement those for each arrow method.

benjamn · 2024-09-26T14:56:20Z

apollo-federation/src/sources/connect/json_selection/parser.rs

 #[derive(Debug, PartialEq, Eq, Clone)]
-pub(super) enum PathList {
+pub(crate) enum PathList {


This enum is not intended to be public, since the construction of a PathList should only ever be performed through parsing, to enforce invariants on the structure of the list. Please find another way of handling this, such as using the PathSelection struct instead, and (if necessary) adding pub(crate) methods to PathSelection that delegate to PathList methods.

benjamn · 2024-09-26T15:02:31Z

apollo-federation/src/sources/connect/validation/selection/visitor.rs

+enum SelectionGroup<'schema> {
+    Root {
+        children: Vec<SelectionPart<'schema>>,
+    },
+    Child {
+        parent: SelectionPart<'schema>,
+        children: Vec<SelectionPart<'schema>>,
+    },
+    Empty {
+        parent: SelectionPart<'schema>,
+    },
+}


Could this be simplified to avoid the Root/Child distinction by having a single parent: Option<SelectionPart<'schema>> that would be None in the Root case?

It also seems like Empty is just the case where children is empty. If that's true, I don't think this type benefits much from being an enum with three different cases, when it could be a single struct.

The Empty one was just vestigial and related to something I abandoned - I'll remove that. The Root vs Child is a way to use Rust type system to ensure correctness of data. In other languages, you might have a parent that is Optional or can be null. This allows creation of instances that are incorrect. For example, you could have a node return a child that has its parent set to Empty or null, which would be invalid. Every bit of code that accesses the parent field then needs to check for this and somehow handle it to avoid hitting a NullPointerExdeption or the like - but there isn't really anything sensible they can do if they encounter it. In Rust, we can avoid this situation entirely by easily creating type variants that enforce correctness of data. A Root is the only node that has no parent, and everything else requires a parent.

benjamn · 2024-09-26T15:06:51Z

apollo-federation/src/sources/connect/json_selection/parser.rs

 pub struct PathSelection {
-    pub(super) path: WithRange<PathList>,
+    pub(crate) path: WithRange<PathList>,


As far as I know, the only way to keep the PathList private in Rust is to avoid opening up its visibility like this. Please reconsider this change.

There should be ways to prevent external construction of the PathList type, but they may or may not be what you want here (moving JSON Selection to its own crate and marking the type as non_exhaustive is one, or adding fields with private types is another).

I'm likely missing something, but I'm not seeing how to proceed if this type is not exposed. The only place methods and method arguments show up is in PathList::Method. So if we need to visit methods and arguments in a visitor, we'll need to access that, or maybe have some new type for the visitor that contains the same data (though that seems like duplication).

benjamn · 2024-09-26T15:07:54Z

apollo-federation/src/sources/connect/json_selection/parser.rs

 pub struct MethodArgs {
-    pub(super) args: Vec<WithRange<LitExpr>>,
+    pub(crate) args: Vec<WithRange<LitExpr>>,


I don't see an immediate problem here (compared to PathList), but I would note that the WithRange representation was relatively recently added and probably should not be considered stable yet.

I'm fine with it being unstable and the validation code needing to change. There is significant value here for customers if we can get something in relatively soon - being able to point customers to a specific range associated with a validation error (such as an arrow method) is a way better experience that just highlighting the entire selection.

benjamn · 2024-09-26T15:14:19Z

apollo-federation/src/sources/connect/validation/selection/visitor.rs

+/// A part of a JSON Selection to be visited
+#[derive(Clone, Debug)]
+pub(super) enum SelectionPart<'schema> {
+    JSONSelection(&'schema JSONSelection),
+    LitExpr(&'schema LitExpr),
+    MethodArgs(&'schema MethodArgs),
+    NamedSelection(&'schema NamedSelection),
+    PathList(&'schema PathList),
+    SubSelection(&'schema SubSelection),
+}


What qualifies a given AST structure as a part of the selection that should be visited, since this list is not exhaustive? I'd suggest adding more detail to the comment.

Just that I needed it to locate arrow methods. This is really just the beginning of a proper AST visitor to allow validating arrow methods. I think @nicholascioli is working on designing something more extensive.

If we decide that there should be a way to externally visit the JSON Selection AST (as opposed to implementing anything related to JSON Selection internally to the JSON Selection module), then it seems like we'd need to be able to visit every kind of node.

pubmodmatt · 2024-09-26T16:05:29Z

I know this PR is still a draft, but I wanted to give some feedback early enough for you to handle it easily.

Broadly speaking, I'm finding it's difficult to defend abstraction boundaries in a Rust codebase, to prevent coupling and allow for implementation details to evolve in (relative) isolation. Tools like keeping PathList as pub(super) are sometimes all we have, so please don't change visibility without discussion.

@benjamn - this PR is specifically created to have that discussion. I was waiting to ask for your input until I have a few more changes in, but thanks for the early feedback! I'll add individual responses on the comments, and we can also meet to discuss. Implementing JSON Selection validation definitely brings up some important topics.

benjamn requested changes Sep 26, 2024

View reviewed changes

benjamn reviewed Sep 26, 2024

View reviewed changes

pubmodmatt added 4 commits September 30, 2024 12:31

Test no errors for valid arrow method usage

e9ece3b

Validate arrow method names and arity

10806bf

Resolve merge conflicts from next

8a5807a

JSON selection locations

fe83d25

pubmodmatt force-pushed the pubmodmatt/connectors/arrow_method_validation branch from 57e0ae0 to fe83d25 Compare September 30, 2024 18:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: arrow method validation #6036

WIP: arrow method validation #6036

pubmodmatt commented Sep 21, 2024 •

edited

Loading

benjamn left a comment

benjamn Sep 26, 2024

pubmodmatt Sep 27, 2024

benjamn Sep 26, 2024 •

edited

Loading

benjamn Sep 26, 2024

pubmodmatt Sep 27, 2024

benjamn Sep 26, 2024

benjamn Sep 26, 2024

pubmodmatt Sep 27, 2024

benjamn Sep 26, 2024

pubmodmatt Sep 27, 2024

benjamn Sep 26, 2024

pubmodmatt Sep 27, 2024

benjamn Sep 26, 2024

pubmodmatt Sep 26, 2024

pubmodmatt Sep 27, 2024

pubmodmatt commented Sep 26, 2024

WIP: arrow method validation #6036

Are you sure you want to change the base?

WIP: arrow method validation #6036

Conversation

pubmodmatt commented Sep 21, 2024 • edited Loading

Footnotes

benjamn left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

benjamn Sep 26, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pubmodmatt commented Sep 26, 2024

pubmodmatt commented Sep 21, 2024 •

edited

Loading

benjamn Sep 26, 2024 •

edited

Loading