Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support to resolve bindings to built-in functions, types and variables #1101

Open
wants to merge 31 commits into
base: main
Choose a base branch
from

Conversation

ggiraldez
Copy link
Collaborator

This PR adds the infrastructure to the Bindings implementation to be able to:

  • define built-in functions, types and global variables, optionally specifying version ranges for those
  • generate an internal built-ins file that can be consumed by the implementation when creating a Bindings object for a specific language version
  • resolve references to built-ins in Solidity by adding some rules to the graph builder file to connect the built-ins parsed file and make all definitions there available to all other ingested source units

So far, this contains just a few built-ins to test the infrastructure and verify that all functions, types and variables can be resolved.

Copy link

changeset-bot bot commented Sep 20, 2024

⚠️ No Changeset found

Latest commit: d782d33

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@ggiraldez ggiraldez marked this pull request as ready for review September 20, 2024 19:33
@ggiraldez ggiraldez requested a review from a team as a code owner September 20, 2024 19:33
@ggiraldez ggiraldez mentioned this pull request Sep 20, 2024
34 tasks
@@ -75,4 +75,8 @@ impl SpannedLanguage {
.flat_map(|section| &section.topics)
.flat_map(|topic| &topic.items);
}

fn built_ins(&self) -> impl Iterator<Item = &SpannedBuiltIn> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I wonder if this is still needed? self.built_ins is already a public iterable, and the single callsite can consume it the same way.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not needed, I wanted to keep uniform wrt. items(). But I agree it doesn't make much sense. I'll remove it.

Comment on lines 163 to 164
pub return_type: Option<String>,
pub parameters: Vec<String>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit suggestion: ordering:

Suggested change
pub return_type: Option<String>,
pub parameters: Vec<String>,
pub parameters: Vec<String>,
pub return_type: Option<String>,

#[derive_spanned_type(Clone, Debug, ParseInputTokens, WriteOutputTokens)]
pub struct BuiltInType {
pub name: String,
pub fields: Vec<BuiltInField>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we plan on adding functions to these types here?

Suggested change
pub fields: Vec<BuiltInField>,
pub fields: Vec<BuiltInField>,
pub functions: Vec<BuiltInFunction>,

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... I was declaring them as fields with appropriate function types, as that's how they are generated in the built-in source file (since types are declared as structs). But for defining them in the language I like your idea better.

#[derive(Clone, Debug, Deserialize, Eq, PartialEq, Serialize)]
#[derive_spanned_type(Clone, Debug, ParseInputTokens, WriteOutputTokens)]
pub struct BuiltInField {
pub def: String,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit suggestion: using the full name for clarity. there are many "definitions" used through out the codebase:

Suggested change
pub def: String,
pub definition: String,

4 │ modifier noReentrancy() {
│ ──────┬─────
│ ╰─────── def: 3
5 │ require(!locked, "No reentrancy");
Copy link
Collaborator

@OmarTawfik OmarTawfik Oct 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now that we have built-ins, what do you think of using success and failure file suffixes to indicate the status of binding (instead of parsing), as that is what we are really checking here.

i.e. If a file has syntax errors, but we still bind everything else, it is a success. But for a test like this, it will change from 0.4.11-failure to 0.5.0-success because of the unresolved require..

We can still include the parse errors for debuggability though.

Thoughts?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It shouldn't be the case that parsing fails and all bindings are resolved. At least I think that should never happen. If that could happen, I'd suspect that the test is not well written (as in, why does it contain syntax that's not important for testing the binding). I like the idea of tagging tests successful only when bindings are resolved. What do you think about using the success suffix when both parsing and bindings succeed?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parsing can still fail when we are explicitly testing binding in the precense of syntax errors, or when we are testing a syntax that is not available in all versions.
Binding here should run regardless and produce results. Right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'm referring to the meaning of success/failure mark only. There would be no change in how the tests are executed.

│ ╰─────── def: 3
5 │ require(!locked, "No reentrancy");
│ ───┬─── ───┬──
│ ╰───────────── ambiguous: built-in, built-in
Copy link
Collaborator

@OmarTawfik OmarTawfik Oct 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC, we are returning multiple definitions in this case (overloads) .. should we print both instead of ambiguous here? it would be a "sucessful" result still..

Suggested change
│ ╰───────────── ambiguous: built-in, built-in
│ ╰───────────── ref: built-in, built-in

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed on both: remove the ambiguous since we're moving away complexity away from the bindings and multiple definitions should be expected more; and it should still be considered successful.

│ ╰───────────────────── def: 3
│ │
│ ╰──── def: 4
5 │ require(_addr != address(0), "Not valid address");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest removing the require() from both of these tests, since they are testing modifiers, and require() is just adding noise for the earlier versions ..
We can add a separate test for it to still cover it..

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I think I have the same issue in several tests.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On second thought, these are testing something relevant: that the body of the modifier can access the arguments passed in (or the state variables in the simple case). I replaced require() by assert() which exists in all supported versions.


use semver::Version;

#[allow(unused_variables)]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of using built_ins.rs.jinja2, I suggest generating separate files for each version:

  • slang_solidity/src/generated/bindings/builtins/0.4.11.sol
  • slang_solidity/src/generated/bindings/builtins/0.5.0.sol
  • etc...

And include it here via include_str()!:

#[allow(unused_variables)]
pub fn get_contents(version: &Version) -> &'static str {
    if *version < Version::new(0, 5, 0) {
        include_str("./builtins/0.4.11")
    } else {
      // etc...
    }
}

This allows syntax highlighting, diffing the versions, and comparing changes during code review.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, I like it.

#[allow(unused_variables)]
pub fn get_contents(version: &Version) -> &'static str {
if *version < Version::new(0, 5, 0) {
r####"contract $$ {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion instead of using $$, as this will be user visible.. would that work for our internal bindings?

Suggested change
r####"contract $$ {
r####"interface $BuiltIn$Functions {

Copy link
Collaborator Author

@ggiraldez ggiraldez Oct 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we can use any name here. I had already changed the name in the complete built-ins branch. The important thing is keeping it in sync with the rules file where it's linked.

Using an interface wouldn't work, since the rules would not bind the declared state variables as globals, because state vars are not allowed in interfaces (even though they will parse successfully). We could make a special case in the rules file if needed though.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because state vars are not allowed in interfaces
We could make a special case in the rules file

That makes sense. Let's stick to the language semantics, to make sure they are understood by users.
In that case, contract $BuiltIns$ sounds good to me (PascalCase).

Comment on lines 36 to 45
;; All built-in symbols are defined in an internal contract named '$$'
;; so we need to construct an equivalent import path to reach them.
;; We should have access to both type members (eg. defined enums & structs)
;; as well as functions and state variables (see special case below), hence
;; why we're introducing a path through `@typeof`.
node built_ins
attr (built_ins) push_symbol = BUILT_INS_FILE_PATH

node built_in_library
attr (built_in_library) push_symbol = "$$"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I suggest adding markers so that they stay in sync. And we can add that everywhere else its name is used/defined.

Suggested change
;; All built-in symbols are defined in an internal contract named '$$'
;; so we need to construct an equivalent import path to reach them.
;; We should have access to both type members (eg. defined enums & structs)
;; as well as functions and state variables (see special case below), hence
;; why we're introducing a path through `@typeof`.
node built_ins
attr (built_ins) push_symbol = BUILT_INS_FILE_PATH
node built_in_library
attr (built_in_library) push_symbol = "$$"
;; All built-in symbols are defined in an internal contract (see __SLANG_BUILT_INS_CONTRACT_NAME__)
;; so we need to construct an equivalent import path to reach them.
;; We should have access to both type members (eg. defined enums & structs)
;; as well as functions and state variables (see special case below), hence
;; why we're introducing a path through `@typeof`.
node built_ins
attr (built_ins) push_symbol = BUILT_INS_FILE_PATH
node built_in_library
attr (built_in_library) push_symbol = "$$" ;; __SLANG_BUILT_INS_CONTRACT_NAME__ (keep in sync)

node built_in_member
attr (built_in_member) push_symbol = "."

edge @source_unit.lexical_scope -> built_in_member
Copy link
Collaborator

@OmarTawfik OmarTawfik Oct 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'm missing something here:

  1. Why would we use push symbols (., @typeof, etc..) here? the users can never write something like $BuiltIns$.XXX anyways. We are already exporting the ContractMembers below to the parent file scope.
  2. What here prevents users from referencing the $BuiltIns$ contract directly in their code? and if they can't, what is the significance of using dollar signs in the first place? why can't we just use contract BuiltIns and struct Address for example?
  3. How is precedence calculated, in case the user defines their own require() method or Address type for example?

Happy to follow up offline to understand better. Not a blocker of course.

Copy link
Collaborator Author

@ggiraldez ggiraldez Oct 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Since built-ins are inside the special $BuiltIns$ contract, we need a way to "promote" them to the global namespace. That's why . and @typeof are pushed here, in connection to the built-ins path. This is as if the user had referenced global with the $BuiltIns$. prefix.

    Having said that, I think we can find a better way to do this, by directly linking
    the $BuiltIns$ members to the global namespace from the built-ins side.

  2. For now, nothing prevents users from referencing $BuiltIns$. I have some code in another branch that renames all identifiers in the built-ins file by replacing $ for %. That way it will be impossible for users to explicitly reference the built-ins contract. Similarly with built-in types such as address.

  3. That's a good point. For types it shouldn't be a problem because of the renaming of built-in identifiers. But we currently don't have any way to prefer a user's definition over the built-in. We could resolve it via ranking, but since we've discussed about removing that feature, we may need to fallback to a nano pass later.

@@ -349,6 +378,12 @@ inherit .enclosing_def
)]
]] {
edge @contract.lexical_scope -> @member.def

;; Special case: for the built-ins file, export state variables in the
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will export all members (functions, structs, etc...), not just state variables, right?

Copy link
Collaborator Author

@ggiraldez ggiraldez Oct 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was needed because normally contract state variables are not accessible externally, and we need that to be able to emulate globals (eg. the global tx is connected in the stack graph as $BuiltIns$.tx). Functions and structs are exposed in a different scope type_members for this reason.

But, as I said in the previous reply, I think there's a better way that would be less confusing.

@@ -79,16 +103,16 @@ struct RuntimeModel {
}

impl RuntimeModel {
pub fn from_language(language: &Rc<Language>) -> Result<Self> {
pub fn from_language(language: &LanguageModel) -> Result<Self> {
Copy link
Collaborator

@OmarTawfik OmarTawfik Oct 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused about the newly added model, that just unwraps to the existing model. Additionally, built-ins are only used/generated in a single runtime (solidity rust) and ignored everywhere else (stubs, testlang, other solidity outputs).

Instead of the changes in the generator crate here, WDYT of just rendering/writing the built ins in the solidity crate directly? as this is all very solidity specific, and cannot be applied to any other runtime. It can also make doing this easier, since the versions/sources can be generated/written to disk directly.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was an attempt at remaining generic for any language defined in Slang, so that all of them can potentially define built-ins. But since we've decided to take that apart and have the user explicitly add the built-ins all this can be re-worked.

I'll rework the built-ins generation code.

type CursorID = usize;
pub struct DefinitionHandle(GraphHandle);

pub const BUILT_INS_FILE_PATH: &str = "@@built-ins@@";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it is possible to supply this value to Bindings::create(), instead of statically defining it, so that the Compilation API can make sure to use a name that doesn't conflict with anything the user might use?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could do that, but wouldn't it make the API awkward to use? Like, why would the user need to decide how to name an internal file? Hmm, I'm thinking the underlying issue here is whether we want to have the "built-ins" as a concept that the Bindings API needs to know about, or is it something that's completely handled outside.

If we want to extract that concept, then having a way to configure the binding rules execution (as in setting a couple of graph global variables) would be enough. I'm not opposed to that idea, but the counterpart is that now users will need to implement this handling themselves.

WDYT?

_ = self.add_file_internal(file, tree_cursor);
self.built_ins_file = Some(file);
}

pub fn add_file(&mut self, file_path: &str, tree_cursor: Cursor<KT>) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that BUILT_INS_FILE_PATH is known at this point, I wonder why the additional overload add_built_ins()? Can't add_file() just check if file_path == BUILT_INS_FILE_PATH and do the right thing?

It eliminates the error of misusing the original function to add built ins by mistake.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking the opposite thing: add_file should check that the user is not trying to add a file as the built-ins, or resolve that internally in some other way so that there's no possibility of clashing. But as I said in the previous comment, I guess it depends on who should handle the built-ins concept.

pub use definition::SolidityDefinition;

pub fn render_built_ins(built_ins: &[BuiltIn]) -> String {
let mut lines: Vec<String> = Vec::new();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion if it is more ergonomic for you: write!() and writeln!() can be used directly in strings, instead of creating a Vec<String> and joining it at the end.
Example in crates/codegen/spec/src/generators/grammar_ebnf.rs.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also suggest moving this to a mod bindings to maintain the structure.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! I didn't know you could write! or writeln! directly to a string. I'll change this.


pub type Bindings = metaslang_bindings::Bindings<KindTypes>;
pub type Definition<'a> = metaslang_bindings::Definition<'a, KindTypes>;
pub type Reference<'a> = metaslang_bindings::Reference<'a, KindTypes>;

pub fn create_with_resolver(
version: Version,
parser: &Parser,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure I understand the change here, given that Bindings::add_built_ins() (and Bindings::add_file()) exist in the public API. The Bindings owner is responsible for loading/parsing the built ins file, and passing its cursor, similar to everything else.
Thoughts?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this was before we decided to move loading the built-ins responsibility away from the Bindings API. I was going to make that change in another PR, but may as well do it here, along with the other issues we've discussed so far.

Copy link
Collaborator

@OmarTawfik OmarTawfik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work!
Left a few questions/suggestions.

Instead of a single built-ins file, the user can now add any number of system
files and the API will ensure there's no collision with user provided files.

For Solidity, any system files are linked to the ROOT_NODE as a special name
"@@built-ins@@" that cannot collide with user files (always prefixed by
"user:").
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants