-
Notifications
You must be signed in to change notification settings - Fork 268
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: arrow method validation #6036
base: next
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this PR is still a draft, but I wanted to give some feedback early enough for you to handle it easily.
Broadly speaking, I'm finding it's difficult to defend abstraction boundaries in a Rust codebase, to prevent coupling and allow for implementation details to evolve in (relative) isolation. Tools like keeping PathList
as pub(super)
are sometimes all we have, so please don't change visibility without discussion.
fn try_get_group_for_field( | ||
&self, | ||
field: &SelectionPart<'schema>, | ||
) -> Result<Option<SelectionGroup<'schema>>, Self::Error> { | ||
// Leaf nodes should return [`SelectionGroup::Empty`] rather than `None` to ensure that | ||
// `exit_group` is called on the empty group, which in turn exits the visitor. | ||
let field = field.clone(); | ||
let result = Ok(match field { | ||
SelectionPart::JSONSelection(json_selection) => match json_selection { | ||
JSONSelection::Named(sub_selection) => Some(SelectionGroup::new( | ||
field, | ||
vec![SelectionPart::SubSelection(sub_selection)], | ||
)), | ||
JSONSelection::Path(path_selection) => Some(SelectionGroup::new( | ||
field, | ||
vec![SelectionPart::PathList(&path_selection.path)], | ||
)), | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code (and what follows) seems to be reaching into and depending on the details of the JSONSelection AST structures. This is worrisome because those structures are actively changing and not at all stable (think of them as an implementation detail, not an interface to interact with directly). To my mind, that means code like this should live within the json_selection
directory, at least. Can we find a better way of organizing this code, to avoid the sprawl?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The plan was to move this visitor into the JSON Selection module once we align on what this visitor looks like. I just put it here temporarily. I also think @nicholascioli also has some work going on in this area that might supersede this.
// TODO: validation requires significant knowledge about arrow methods - need a better mechanism | ||
// to provide it | ||
lazy_static! { | ||
static ref ARROW_METHODS: IndexMap<&'static str, Option<usize>> = { | ||
let mut arrow_methods = IndexMap::default(); | ||
arrow_methods.insert("echo", Some(1)); | ||
arrow_methods.insert("map", Some(1)); | ||
arrow_methods.insert("match", None); | ||
arrow_methods.insert("first", Some(0)); | ||
arrow_methods.insert("last", Some(0)); | ||
arrow_methods.insert("slice", None); | ||
arrow_methods.insert("size", Some(0)); | ||
arrow_methods.insert("entries", Some(0)); | ||
arrow_methods | ||
}; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is another example of code that should, at the very least, reside nearer to the implementations of the arrow methods.
I'm also not entirely comfortable saying that methods like ->slice
have unknown arity, since the arity is specifically required to be between 0 and 2 arguments (all integers). If we're going to be validating this, it seems worthwhile to distinguish a case like this from (e.g.) the ->match
case, which needs at least two arguments.
More generally, though, I don't think arity really covers what we want to check about method parameters. We need a way of validating the sequence of types of the possible parameters, not just a check on how many are provided. But that can wait for future PRs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, could we call this static map something like ARROW_METHOD_ARITIES
rather than just ARROW_METHODS
(since that's the name of the actual map of arrow methods we use elsewhere)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Entirely agree - as indicated in the TODO
comment here, this is just a dummy placeholder for some better mechanism, and the discussion I want to have is what that better mechanism should look like. The arity is just meant as a simple example of something we could validate, but the real validation will need to look at type information among other things.
There are a couple of ways I could see this going:
-
Expose enough metadata about arrow methods that external components can understand how to handle them. Making that metadata rich enough to describe what is needed would be challenging. Consider the
match
method - it requires knowing that the arguments are arrays of 2 items, where the first item type needs to match the input type and the second item type needs to match the output type. -
Have each arrow method be responsible for implementing a set of traits to do the things required by external components. So there could be a trait for validation another to provide code completion, etc. This co-locates these implementations with the arrow method implementations, but it creates some coupling - the JSON Selection code has to be aware of external things like validation, code completion, etc. and implement those for each arrow method.
#[derive(Debug, PartialEq, Eq, Clone)] | ||
pub(super) enum PathList { | ||
pub(crate) enum PathList { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This enum
is not intended to be public, since the construction of a PathList
should only ever be performed through parsing, to enforce invariants on the structure of the list. Please find another way of handling this, such as using the PathSelection
struct instead, and (if necessary) adding pub(crate)
methods to PathSelection
that delegate to PathList
methods.
enum SelectionGroup<'schema> { | ||
Root { | ||
children: Vec<SelectionPart<'schema>>, | ||
}, | ||
Child { | ||
parent: SelectionPart<'schema>, | ||
children: Vec<SelectionPart<'schema>>, | ||
}, | ||
Empty { | ||
parent: SelectionPart<'schema>, | ||
}, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this be simplified to avoid the Root
/Child
distinction by having a single parent: Option<SelectionPart<'schema>>
that would be None
in the Root
case?
It also seems like Empty
is just the case where children
is empty. If that's true, I don't think this type benefits much from being an enum with three different cases, when it could be a single struct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Empty
one was just vestigial and related to something I abandoned - I'll remove that. The Root
vs Child
is a way to use Rust type system to ensure correctness of data. In other languages, you might have a parent that is Optional
or can be null
. This allows creation of instances that are incorrect. For example, you could have a node return a child that has its parent set to Empty
or null
, which would be invalid. Every bit of code that accesses the parent field then needs to check for this and somehow handle it to avoid hitting a NullPointerExdeption
or the like - but there isn't really anything sensible they can do if they encounter it. In Rust, we can avoid this situation entirely by easily creating type variants that enforce correctness of data. A Root
is the only node that has no parent, and everything else requires a parent.
pub struct PathSelection { | ||
pub(super) path: WithRange<PathList>, | ||
pub(crate) path: WithRange<PathList>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I know, the only way to keep the PathList
private in Rust is to avoid opening up its visibility like this. Please reconsider this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There should be ways to prevent external construction of the PathList
type, but they may or may not be what you want here (moving JSON Selection to its own crate and marking the type as non_exhaustive
is one, or adding fields with private types is another).
I'm likely missing something, but I'm not seeing how to proceed if this type is not exposed. The only place methods and method arguments show up is in PathList::Method
. So if we need to visit methods and arguments in a visitor, we'll need to access that, or maybe have some new type for the visitor that contains the same data (though that seems like duplication).
pub struct MethodArgs { | ||
pub(super) args: Vec<WithRange<LitExpr>>, | ||
pub(crate) args: Vec<WithRange<LitExpr>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see an immediate problem here (compared to PathList
), but I would note that the WithRange
representation was relatively recently added and probably should not be considered stable yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with it being unstable and the validation code needing to change. There is significant value here for customers if we can get something in relatively soon - being able to point customers to a specific range associated with a validation error (such as an arrow method) is a way better experience that just highlighting the entire selection.
/// A part of a JSON Selection to be visited | ||
#[derive(Clone, Debug)] | ||
pub(super) enum SelectionPart<'schema> { | ||
JSONSelection(&'schema JSONSelection), | ||
LitExpr(&'schema LitExpr), | ||
MethodArgs(&'schema MethodArgs), | ||
NamedSelection(&'schema NamedSelection), | ||
PathList(&'schema PathList), | ||
SubSelection(&'schema SubSelection), | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What qualifies a given AST structure as a part of the selection that should be visited, since this list is not exhaustive? I'd suggest adding more detail to the comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just that I needed it to locate arrow methods. This is really just the beginning of a proper AST visitor to allow validating arrow methods. I think @nicholascioli is working on designing something more extensive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we decide that there should be a way to externally visit the JSON Selection AST (as opposed to implementing anything related to JSON Selection internally to the JSON Selection module), then it seems like we'd need to be able to visit every kind of node.
@benjamn - this PR is specifically created to have that discussion. I was waiting to ask for your input until I have a few more changes in, but thanks for the early feedback! I'll add individual responses on the comments, and we can also meet to discuss. Implementing JSON Selection validation definitely brings up some important topics. |
57e0ae0
to
fe83d25
Compare
Validation changes:
JSON Selection changes:
pub(crate)
so the validation code is able to visit all the places where an arrow method name can appearKnown Issues:
already_seen
list as the selection is validated. However, if any other validation fails and returns early, this validation will not see any remaining fields and incorrectly reports errors. This will need to be addressed separately, but the result can be seen in the test snapshots for this PR.Checklist
Complete the checklist (and note appropriate exceptions) before the PR is marked ready-for-review.
Exceptions
Note any exceptions here
Notes
Footnotes
It may be appropriate to bring upcoming changes to the attention of other (impacted) groups. Please endeavour to do this before seeking PR approval. The mechanism for doing this will vary considerably, so use your judgement as to how and when to do this. ↩
Configuration is an important part of many changes. Where applicable please try to document configuration examples. ↩
Tick whichever testing boxes are applicable. If you are adding Manual Tests, please document the manual testing (extensively) in the Exceptions. ↩