Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Add copying where it might be necessary #4118

Closed
wants to merge 0 commits into from

Conversation

sternj
Copy link

@sternj sternj commented Sep 27, 2024

NOTE: This is not ready for review, I have a significant amount of documentation to add, I need to update the test suite, and I missed several cases that I am writing out below. I am posting this early to receive feedback on the broad strokes of the approach, the implementation is (right now) entirely incorrect.

Description

A proposed fix for #4113 that identifies what parameters might be mutated within a function that Ghostwriter is called on. To prevent side effects (particularly in equivalence mode), adds a call to copy.deepcopy to the parameter in the test function.

Right now, I mark a variable as potentially modified if:

  • there is an instance of is on the left hand side of an ast.Assign or ast.AugAssign that is not inside an ast.Subscript or ast.Slice. This includes when it is being indexed into or an attribute of it is being assigned
  • It or any of its attributes are being Called. Even if this isn't on the left-hand side of an assignment, it is safe to assume that any call may create side effects.

Todo

  • Ignore names used inside value of ast.Subscript or slice in ast.Slice
  • Get unit test passage to parity with main branch
  • Correct documentation for potentially_modified_vars
  • Rename visitors to have more intention-revealing names

Copy link
Member

@Zac-HD Zac-HD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall approach looks good to me - thanks for working on this!

One quick tip - please make your pull request(s) from a branch, not from master - https://hynek.me/articles/pull-requests-branch/ explains why this is important. I think in this case you'll need to close the PR, make a branch with your commits, and open a new one from that branch.

Comment on lines 807 to 814
We want to get any variables that might be modified in a way that might
cause side-effects so that we can copy them.

A variable are potentially modified if:
- a method is called on it, an attribute under it, or an index
of it is anywhere on the LHS of an ast.Assign or an ast.AugAssign
- a method called on it, an attribute of it, or an index of it
- Do we want to count methods on the RHS?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd only count method calls if they're top-level statements, not part of an assignment or inside an index or whatever - the heuristic being that methods typically either mutate state, xor have a return value.

arg[index] = ...
arg.attr = ...
arg.method()

result = arg.other_method()  # probably safe

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not entirely convinced that this heuristic works-- I think it is a safe assumption that a standard accessor doesn't modify anything (which is not necessarily true, but is close enough to true) but the pattern of a method that mutates and then returns is one that's relatively common-- for instance, even in the standard library, IO.read returns a value and modifies the offset.

Files are, of course, a case where side effects exist outside of the program where copying wouldn't help, but the pattern itself is very well established in Python, and this was a simple example!

Comment on lines 860 to 865
if kind is inspect.Parameter.VAR_POSITIONAL:
# TODO: I don't know what correct behavior is here
pass
if kind is inspect.Parameter.VAR_KEYWORD:
# TODO: I don't know what correct behavior is here
pass
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We never ghostwrite inputs for these, so no need to support them 🙂

Comment on lines 884 to 887
if copy_args:
potentially_modified = potentially_modified_vars(ast.parse(inspect.getsource(func)).body[0], set(params_dd.keys()))
else:
potentially_modified = set()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we probably want to deepcopy all arguments unconditionally if copy_args=True, and autodetect on =None.

The code would be a bit cleaner if we also do a from copy import deepcopy, meaning this function would report a str, set-of-imports tuple - there are plenty of examples you can see of how that works.

@sternj
Copy link
Author

sternj commented Sep 30, 2024

I'm making the new PR right now! I'll act on the comments here, no need to rewrite them, but I will be copying over the whole description

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants