Add support to resolve bindings to built-in functions, types and variables #1101

ggiraldez · 2024-09-20T03:47:03Z

This PR adds the infrastructure to the Bindings implementation to be able to:

define built-in functions, types and global variables, optionally specifying version ranges for those
generate an internal built-ins file that can be consumed by the implementation when creating a Bindings object for a specific language version
resolve references to built-ins in Solidity by adding some rules to the graph builder file to connect the built-ins parsed file and make all definitions there available to all other ingested source units

So far, this contains just a few built-ins to test the infrastructure and verify that all functions, types and variables can be resolved.

…truction

…tency

changeset-bot · 2024-09-20T03:47:06Z

⚠️ No Changeset found

Latest commit: d782d33

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

... and to silence clippy :)

OmarTawfik · 2024-10-02T22:48:50Z

crates/codegen/language/definition/src/compiler/analysis/mod.rs

@@ -75,4 +75,8 @@ impl SpannedLanguage {
            .flat_map(|section| &section.topics)
            .flat_map(|topic| &topic.items);
    }
+
+    fn built_ins(&self) -> impl Iterator<Item = &SpannedBuiltIn> {


nit: I wonder if this is still needed? self.built_ins is already a public iterable, and the single callsite can consume it the same way.

It's not needed, I wanted to keep uniform wrt. items(). But I agree it doesn't make much sense. I'll remove it.

OmarTawfik · 2024-10-02T22:57:16Z

crates/codegen/language/definition/src/model/manifest.rs

+    pub return_type: Option<String>,
+    pub parameters: Vec<String>,


nit suggestion: ordering:

Suggested change

pub return_type: Option<String>,

pub parameters: Vec<String>,

pub parameters: Vec<String>,

pub return_type: Option<String>,

OmarTawfik · 2024-10-02T23:02:12Z

crates/codegen/language/definition/src/model/manifest.rs

+#[derive_spanned_type(Clone, Debug, ParseInputTokens, WriteOutputTokens)]
+pub struct BuiltInType {
+    pub name: String,
+    pub fields: Vec<BuiltInField>,


I wonder if we plan on adding functions to these types here?

Suggested change

pub fields: Vec<BuiltInField>,

pub fields: Vec<BuiltInField>,

pub functions: Vec<BuiltInFunction>,

Hmm... I was declaring them as fields with appropriate function types, as that's how they are generated in the built-in source file (since types are declared as structs). But for defining them in the language I like your idea better.

OmarTawfik · 2024-10-02T23:03:16Z

crates/codegen/language/definition/src/model/manifest.rs

+#[derive(Clone, Debug, Deserialize, Eq, PartialEq, Serialize)]
+#[derive_spanned_type(Clone, Debug, ParseInputTokens, WriteOutputTokens)]
+pub struct BuiltInField {
+    pub def: String,


nit suggestion: using the full name for clarity. there are many "definitions" used through out the codebase:

Suggested change

pub def: String,

pub definition: String,

OmarTawfik · 2024-10-02T23:51:36Z

crates/solidity/testing/snapshots/bindings_output/modifiers/simple/generated/0.5.0-success.txt

+  4 │     modifier noReentrancy() {
+    │              ──────┬─────  
+    │                    ╰─────── def: 3
+  5 │         require(!locked, "No reentrancy");


now that we have built-ins, what do you think of using success and failure file suffixes to indicate the status of binding (instead of parsing), as that is what we are really checking here.

i.e. If a file has syntax errors, but we still bind everything else, it is a success. But for a test like this, it will change from 0.4.11-failure to 0.5.0-success because of the unresolved require..

We can still include the parse errors for debuggability though.

Thoughts?

It shouldn't be the case that parsing fails and all bindings are resolved. At least I think that should never happen. If that could happen, I'd suspect that the test is not well written (as in, why does it contain syntax that's not important for testing the binding). I like the idea of tagging tests successful only when bindings are resolved. What do you think about using the success suffix when both parsing and bindings succeed?

parsing can still fail when we are explicitly testing binding in the precense of syntax errors, or when we are testing a syntax that is not available in all versions.
Binding here should run regardless and produce results. Right?

Yes, I'm referring to the meaning of success/failure mark only. There would be no change in how the tests are executed.

OmarTawfik · 2024-10-02T23:52:06Z

crates/solidity/testing/snapshots/bindings_output/modifiers/simple/generated/0.5.0-success.txt

+    │                    ╰─────── def: 3
+  5 │         require(!locked, "No reentrancy");
+    │         ───┬───  ───┬──  
+    │            ╰───────────── ambiguous: built-in, built-in


IIUC, we are returning multiple definitions in this case (overloads) .. should we print both instead of ambiguous here? it would be a "sucessful" result still..

Suggested change

│ ╰───────────── ambiguous: built-in, built-in

│ ╰───────────── ref: built-in, built-in

Agreed on both: remove the ambiguous since we're moving away complexity away from the bindings and multiple definitions should be expected more; and it should still be considered successful.

OmarTawfik · 2024-10-02T23:54:53Z

...s/solidity/testing/snapshots/bindings_output/modifiers/with_args/generated/0.5.0-success.txt

+    │                    ╰───────────────────── def: 3
+    │                                     │    
+    │                                     ╰──── def: 4
+  5 │         require(_addr != address(0), "Not valid address");


I suggest removing the require() from both of these tests, since they are testing modifiers, and require() is just adding noise for the earlier versions ..
We can add a separate test for it to still cover it..

Good point. I think I have the same issue in several tests.

On second thought, these are testing something relevant: that the body of the modifier can access the arguments passed in (or the state variables in the simple case). I replaced require() by assert() which exists in all supported versions.

OmarTawfik · 2024-10-03T00:07:08Z

crates/solidity/outputs/cargo/slang_solidity/src/generated/bindings/generated/built_ins.rs

+
+use semver::Version;
+
+#[allow(unused_variables)]


Instead of using built_ins.rs.jinja2, I suggest generating separate files for each version:

slang_solidity/src/generated/bindings/builtins/0.4.11.sol

slang_solidity/src/generated/bindings/builtins/0.5.0.sol

etc...

And include it here via include_str()!:

#[allow(unused_variables)] pub fn get_contents(version: &Version) -> &'static str { if *version < Version::new(0, 5, 0) { include_str("./builtins/0.4.11") } else { // etc... } }

This allows syntax highlighting, diffing the versions, and comparing changes during code review.

Sounds good, I like it.

OmarTawfik · 2024-10-03T00:08:45Z

crates/solidity/outputs/cargo/slang_solidity/src/generated/bindings/generated/built_ins.rs

+#[allow(unused_variables)]
+pub fn get_contents(version: &Version) -> &'static str {
+    if *version < Version::new(0, 5, 0) {
+        r####"contract $$ {


suggestion instead of using $$, as this will be user visible.. would that work for our internal bindings?

Suggested change

r####"contract $$ {

r####"interface $BuiltIn$Functions {

Yes, we can use any name here. I had already changed the name in the complete built-ins branch. The important thing is keeping it in sync with the rules file where it's linked.

Using an interface wouldn't work, since the rules would not bind the declared state variables as globals, because state vars are not allowed in interfaces (even though they will parse successfully). We could make a special case in the rules file if needed though.

because state vars are not allowed in interfaces
We could make a special case in the rules file

That makes sense. Let's stick to the language semantics, to make sure they are understood by users.
In that case, contract $BuiltIns$ sounds good to me (PascalCase).

OmarTawfik · 2024-10-03T12:22:09Z

crates/solidity/inputs/language/bindings/rules.msgb

+    ;; All built-in symbols are defined in an internal contract named '$$'
+    ;; so we need to construct an equivalent import path to reach them.
+    ;; We should have access to both type members (eg. defined enums & structs)
+    ;; as well as functions and state variables (see special case below), hence
+    ;; why we're introducing a path through `@typeof`.
+    node built_ins
+    attr (built_ins) push_symbol = BUILT_INS_FILE_PATH
+
+    node built_in_library
+    attr (built_in_library) push_symbol = "$$"


nit: I suggest adding markers so that they stay in sync. And we can add that everywhere else its name is used/defined.

Suggested change

;; All built-in symbols are defined in an internal contract named '$$'

;; so we need to construct an equivalent import path to reach them.

;; We should have access to both type members (eg. defined enums & structs)

;; as well as functions and state variables (see special case below), hence

;; why we're introducing a path through `@typeof`.

node built_ins

attr (built_ins) push_symbol = BUILT_INS_FILE_PATH

node built_in_library

attr (built_in_library) push_symbol = "$$"

;; All built-in symbols are defined in an internal contract (see __SLANG_BUILT_INS_CONTRACT_NAME__)

;; so we need to construct an equivalent import path to reach them.

;; We should have access to both type members (eg. defined enums & structs)

;; as well as functions and state variables (see special case below), hence

;; why we're introducing a path through `@typeof`.

node built_ins

attr (built_ins) push_symbol = BUILT_INS_FILE_PATH

node built_in_library

attr (built_in_library) push_symbol = "$$" ;; __SLANG_BUILT_INS_CONTRACT_NAME__ (keep in sync)

OmarTawfik · 2024-10-03T12:31:43Z

crates/solidity/inputs/language/bindings/rules.msgb

+    node built_in_member
+    attr (built_in_member) push_symbol = "."
+
+    edge @source_unit.lexical_scope -> built_in_member


I think I'm missing something here:

Why would we use push symbols (., @typeof, etc..) here? the users can never write something like $BuiltIns$.XXX anyways. We are already exporting the ContractMembers below to the parent file scope.

What here prevents users from referencing the $BuiltIns$ contract directly in their code? and if they can't, what is the significance of using dollar signs in the first place? why can't we just use contract BuiltIns and struct Address for example?

How is precedence calculated, in case the user defines their own require() method or Address type for example?

Happy to follow up offline to understand better. Not a blocker of course.

Since built-ins are inside the special $BuiltIns$ contract, we need a way to "promote" them to the global namespace. That's why . and @typeof are pushed here, in connection to the built-ins path. This is as if the user had referenced global with the $BuiltIns$. prefix.

Having said that, I think we can find a better way to do this, by directly linking
the $BuiltIns$ members to the global namespace from the built-ins side.

For now, nothing prevents users from referencing $BuiltIns$ . I have some code in another branch that renames all identifiers in the built-ins file by replacing $ for %. That way it will be impossible for users to explicitly reference the built-ins contract. Similarly with built-in types such as address.

That's a good point. For types it shouldn't be a problem because of the renaming of built-in identifiers. But we currently don't have any way to prefer a user's definition over the built-in. We could resolve it via ranking, but since we've discussed about removing that feature, we may need to fallback to a nano pass later.

OmarTawfik · 2024-10-03T12:32:49Z

crates/solidity/inputs/language/bindings/rules.msgb

@@ -349,6 +378,12 @@ inherit .enclosing_def
    )]
 ]] {
  edge @contract.lexical_scope -> @member.def
+
+  ;; Special case: for the built-ins file, export state variables in the


This will export all members (functions, structs, etc...), not just state variables, right?

This was needed because normally contract state variables are not accessible externally, and we need that to be able to emulate globals (eg. the global tx is connected in the stack graph as $BuiltIns$.tx). Functions and structs are exposed in a different scope type_members for this reason.

But, as I said in the previous reply, I think there's a better way that would be less confusing.

OmarTawfik · 2024-10-03T13:00:27Z

crates/codegen/runtime/generator/src/lib.rs

@@ -79,16 +103,16 @@ struct RuntimeModel {
 }

 impl RuntimeModel {
-    pub fn from_language(language: &Rc<Language>) -> Result<Self> {
+    pub fn from_language(language: &LanguageModel) -> Result<Self> {


I'm a bit confused about the newly added model, that just unwraps to the existing model. Additionally, built-ins are only used/generated in a single runtime (solidity rust) and ignored everywhere else (stubs, testlang, other solidity outputs).

Instead of the changes in the generator crate here, WDYT of just rendering/writing the built ins in the solidity crate directly? as this is all very solidity specific, and cannot be applied to any other runtime. It can also make doing this easier, since the versions/sources can be generated/written to disk directly.

This was an attempt at remaining generic for any language defined in Slang, so that all of them can potentially define built-ins. But since we've decided to take that apart and have the user explicitly add the built-ins all this can be re-worked.

I'll rework the built-ins generation code.

OmarTawfik · 2024-10-03T13:19:55Z

crates/metaslang/bindings/src/lib.rs

 type CursorID = usize;
 pub struct DefinitionHandle(GraphHandle);

+pub const BUILT_INS_FILE_PATH: &str = "@@built-ins@@";


I wonder if it is possible to supply this value to Bindings::create(), instead of statically defining it, so that the Compilation API can make sure to use a name that doesn't conflict with anything the user might use?

We could do that, but wouldn't it make the API awkward to use? Like, why would the user need to decide how to name an internal file? Hmm, I'm thinking the underlying issue here is whether we want to have the "built-ins" as a concept that the Bindings API needs to know about, or is it something that's completely handled outside.

If we want to extract that concept, then having a way to configure the binding rules execution (as in setting a couple of graph global variables) would be enough. I'm not opposed to that idea, but the counterpart is that now users will need to implement this handling themselves.

WDYT?

OmarTawfik · 2024-10-03T13:21:50Z

crates/metaslang/bindings/src/lib.rs

+        _ = self.add_file_internal(file, tree_cursor);
+        self.built_ins_file = Some(file);
+    }
+
    pub fn add_file(&mut self, file_path: &str, tree_cursor: Cursor<KT>) {


Given that BUILT_INS_FILE_PATH is known at this point, I wonder why the additional overload add_built_ins()? Can't add_file() just check if file_path == BUILT_INS_FILE_PATH and do the right thing?

It eliminates the error of misusing the original function to add built ins by mistake.

I was thinking the opposite thing: add_file should check that the user is not trying to add a file as the built-ins, or resolve that internally in some other way so that there's no possibility of clashing. But as I said in the previous comment, I guess it depends on who should handle the built-ins concept.

OmarTawfik · 2024-10-03T13:27:55Z

crates/solidity/inputs/language/src/lib.rs

 pub use definition::SolidityDefinition;
+
+pub fn render_built_ins(built_ins: &[BuiltIn]) -> String {
+    let mut lines: Vec<String> = Vec::new();


suggestion if it is more ergonomic for you: write!() and writeln!() can be used directly in strings, instead of creating a Vec<String> and joining it at the end.
Example in crates/codegen/spec/src/generators/grammar_ebnf.rs.

I also suggest moving this to a mod bindings to maintain the structure.

Nice! I didn't know you could write! or writeln! directly to a string. I'll change this.

OmarTawfik · 2024-10-03T13:35:12Z

crates/codegen/runtime/cargo/src/runtime/bindings/mod.rs


 pub type Bindings = metaslang_bindings::Bindings<KindTypes>;
 pub type Definition<'a> = metaslang_bindings::Definition<'a, KindTypes>;
 pub type Reference<'a> = metaslang_bindings::Reference<'a, KindTypes>;

 pub fn create_with_resolver(
-    version: Version,
+    parser: &Parser,


not sure I understand the change here, given that Bindings::add_built_ins() (and Bindings::add_file()) exist in the public API. The Bindings owner is responsible for loading/parsing the built ins file, and passing its cursor, similar to everything else.
Thoughts?

Yeah, this was before we decided to move loading the built-ins responsibility away from the Bindings API. I was going to make that change in another PR, but may as well do it here, along with the other issues we've discussed so far.

OmarTawfik

Great work!
Left a few questions/suggestions.

From now on, `success` means that both parsing was successful and all references were resolved.

The user must parse and add them explicitly.

Instead of a single built-ins file, the user can now add any number of system files and the API will ensure there's no collision with user provided files. For Solidity, any system files are linked to the ROOT_NODE as a special name "@@built-ins@@" that cannot collide with user files (always prefixed by "user:").

ggiraldez added 9 commits September 13, 2024 18:11

First cut at implementing support for bindings built-in identifiers

dd6e148

Move builtins content to the language definition and load during cons…

8a7ef47

…truction

Move built-ins rendering specific code to the input language

9a12700

Make generated built-ins source dependant on the language version

bf9de78

Separate built-ins parsing test

ba6c84d

Allow filtering of fields in built-in type per version

81b9c95

Check built-ins in language definitions for version enablement consis…

fcc371e

…tency

Make built-in types, functions and variables visible

ebd47c4

Simplify built-in parameter and field definitions

486aebb

ggiraldez added 5 commits September 19, 2024 23:57

Output a built-in marker when found in ambiguous definitions

5939f3b

Rename builtin -> built-in (and similar identifiers)

bef4458

Refactor: split apart render_bindings for readability

c6275db

... and to silence clippy :)

Externalize built-ins file path from the graph building rules file

3757c60

Use a Vec<String> for built-in parameters definition

d6fe128

ggiraldez marked this pull request as ready for review September 20, 2024 19:33

ggiraldez requested a review from a team as a code owner September 20, 2024 19:33

ggiraldez mentioned this pull request Sep 20, 2024

Implement Solidity binding rules #1077

Open

34 tasks

Merge remote-tracking branch 'upstream/main' into builtin-bindings

23ecf01

OmarTawfik reviewed Oct 2, 2024

View reviewed changes

OmarTawfik reviewed Oct 3, 2024

View reviewed changes

Remove built_ins() method in SpannedLanguage

ea9cc13

OmarTawfik reviewed Oct 3, 2024

View reviewed changes

OmarTawfik requested changes Oct 3, 2024

View reviewed changes

ggiraldez added 13 commits October 3, 2024 16:45

Merge branch 'main' into builtin-bindings

eed92dd

Rework built-ins definitions

fbec218

Use writeln! to generate the built-ins contents

7e05529

Revamp built-ins generation code

904199a

Track built-ins source files with the CodegenFileSystem

5c8a842

Add marker reminders to keep built-ins contract name in sync

aaaab29

Change the output for ambiguous definitions on bindings snapshots

3ad2a37

Use assert in bindings snapshots, which is available in all versions

6dd6cb9

Change status/failure marker semantics for bindings output snapshots

35412c3

From now on, `success` means that both parsing was successful and all references were resolved.

Fix formatting

c38535a

Adding built-ins to bindings is no longer done by default

96c0c2b

The user must parse and add them explicitly.

Revert unnecessary change

0eafae0

Simplify built-ins import/export in the stack graph

f08836a

ggiraldez force-pushed the builtin-bindings branch from e51c286 to f08836a Compare October 7, 2024 19:38

ggiraldez added 2 commits October 7, 2024 19:05

Updated public_api.txt file

d782d33

		pub return_type: Option<String>,
		pub parameters: Vec<String>,

	pub fields: Vec<BuiltInField>,
	pub fields: Vec<BuiltInField>,
	pub functions: Vec<BuiltInFunction>,

	│ ╰───────────── ambiguous: built-in, built-in
	│ ╰───────────── ref: built-in, built-in

Add support to resolve bindings to built-in functions, types and variables #1101

Are you sure you want to change the base?

Add support to resolve bindings to built-in functions, types and variables #1101

Conversation

ggiraldez commented Sep 20, 2024

changeset-bot bot commented Sep 20, 2024 • edited Loading

⚠️ No Changeset found

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

OmarTawfik Oct 2, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

OmarTawfik Oct 2, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ggiraldez Oct 3, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

OmarTawfik Oct 3, 2024 • edited Loading

Choose a reason for hiding this comment

ggiraldez Oct 3, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ggiraldez Oct 3, 2024 • edited Loading

Choose a reason for hiding this comment

OmarTawfik Oct 3, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

OmarTawfik left a comment

Choose a reason for hiding this comment

changeset-bot bot commented Sep 20, 2024 •

edited

Loading

OmarTawfik Oct 2, 2024 •

edited

Loading

OmarTawfik Oct 2, 2024 •

edited

Loading

ggiraldez Oct 3, 2024 •

edited

Loading

OmarTawfik Oct 3, 2024 •

edited

Loading

ggiraldez Oct 3, 2024 •

edited

Loading

ggiraldez Oct 3, 2024 •

edited

Loading

OmarTawfik Oct 3, 2024 •

edited

Loading