Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow custom asset types in the build system #1623

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

mkustermann
Copy link
Member

@mkustermann mkustermann commented Oct 2, 2024

This is the first PR that will increase the flexibility of the Dart Build system as well as makes it more layered. The main points being addressed in this PR are:

  • A bundling tool has to specify which asset types it supports. So we make supportedAssetTypes required everywhere.
    => Across various layers in the code base we make this required
    => We remove a baked-in [CodeAsset.type] fallback in various places (Making this explicit increases size of LOCs in this CL due to the many tests)

  • The core building infrastructure in pkg/native_assets_builder/lib/* no longer knows anything about CodeAsset or DataAsset. Instead it only knows about EncodedAsset which represents the type of an asset and it's json encoding.
    => The core still verifies certain protocol consistency (e.g. that a hook only emitted asset types that the config allows)
    => The higher levels pass the supportedAssetTypes.
    => The higher levels now also pass buildValidator / linkValidator that validate per-asset related things (e.g. the id of a data asset to start with the package name) for a given package.
    => The higher levels now also pass a applicationAssetValidator that validates consistency of assets across all packages (e.g. uniqueness of dylib filenames across all dependencies)
    => This centralizes the logic about per-asset-type information in the bundling tool (i.e. user of package:native_assets_builder).
    => Bundling tools now have to expand CodeAssets in dry-run to all architectures.

  • The bundling tool (e.g. flutter build) doesn't have to implement validation logic itself, instead it just import the "bundling logic" (which includes validation logic) for various asset types .

  • All the validation code now simply returns ValidationError which is a list of errors. If the list is empty then there were no errors.
    => This removes redundancy between a success boolean and the list of errors (it was successsfull iff there were no errors)
    => This simplifies code that aggregates errors from different sources.

  • Moves the fromJson / toJson towards be purely read/write a json schema and are not responsible for anything else.
    => The validation routines are responsible for validating semantics.

  • The hook writer API now changes to work on per-asset-type specific extension methods. For example what used to be output.addAsset(DataAsset(...)); is now output.dataAssets.add(DataAsset(...));

    The BuildOutput.encodedAsset getter is (with this PR) temporarily visible to hook writers, which could use it like this: output.encodedAssets.add(DataAsset(...).encode()); but a following refactoring that increases the flexibility of the protocol more will also get rid of the user-visible encodedAsset getter (if we choose so). So this is only temporarily exposed.

  • hook/link.dart scripts have to mark any input files it's output depends on in the LinkOutput.addDependency. If the set of inputs is the same but one input's file changed then a hook/link.dart

    • doesn't have to re-run if it just drops all assets or outputs them unchanged
    • has to re-run if it copies a input, changes it's contents, merge with other inputs, etc
      => A hook/link.dart should - just like a hook/build.dart add dependencies iff changing the input file contents should cause a re-run of the hook
      => Before this was somewhat hard-coded into the package:native_assets_builder.
  • We start separating the per-asset-type specific code: native_assets_cli/lib/src/code_assets/* & native_assets_cli/lib/src/data_assets/*.
    => More refactorings will come that will make the core native_assets_cli/lib/src/* no longer have any per-asset-type specific things in them
    => Eventually the lib/src/data_assets and lib/src/code_assets could become their own packages (if we choose to do so) - similar to how flutter-specific assets can live in other packages.

This is the first PR that will increase the flexibility of the Dart
Build system as well as makes it more layered. The main points being
addressed in this PR are:

* A bundling tool has to specify which asset types it supports. So we
  make `supportedAssetTypes` required everywhere.
  => Across various layers in the code base we make this required
  => We remove a baked-in `[CodeAsset.type]` fallback in various places
  (Making this explicit increases size of LOCs in this CL due to the
   many tests)

* The core building infrastructure in `pkg/native_assets_builder/lib/*`
  no longer knows anything about `CodeAsset` or `DataAsset`. Instead it
  only knows about `EncodedAsset` which represents the type of an asset
  and it's json encoding.
  => The core still verifies certain protocol consistency (e.g. that a
  hook only emitted asset types that the config allows)
  => The higher levels pass the `supportedAssetTypes`.
  => The higher levels now also pass `buildValidator` / `linkValidator`
  that validate per-asset related things (e.g. the id of a data asset to
  start with the package name) for a given package.
  => The higher levels now also pass a `applicationAssetValidator` that
  validates consistency of assets across all packages (e.g. uniqueness
  of dylib filenames across all dependencies)
  => This centralizes the logic about per-asset-type information in the
  bundling tool (i.e. user of `package:native_assets_builder`).
  => Bundling tools now have to expand `CodeAsset`s in dry-run to all
  architectures.

* The bundling tool (e.g. `flutter build`) doesn't have to implement
  validation logic itself, instead it will support certain asset types
  by importing the "bundling logic" for that asset types (which includes
  - for now - validation logic).

* All the validation code now simply returns `ValidationError` which is
  a list of errors. If the list is empty then there were no errors.
  => This removes redundancy between a `success` boolean and the list of errors
     (it was successsfull iff there were no errors)
  => This simplifies code that aggregates errors from different sources.

* Moves the `fromJson` / `toJson` towards be purely read/write a json schema
  and are not responsible for anything else.
  => The validation routines are responsible for validating semantics.

* The hook writer API now changes to work on per-asset-type specific
  extension methods.
  For example what used to be
  ```
    output.addAsset(DataAsset(...));
  ```
  is now
  ```
    output.dataAssets.add(DataAsset(...));
  ```

  The `BuildOutput.encodedAsset` getter is (with this PR) temporarily visible
  to hook writers, which could use it like this:
  ```
    output.encodedAssets.add(DataAsset(...).encode());
  ```
  but a following refactoring that increases the flexibility of the
  protocol more will also get rid of the user-visible `encodedAsset`
  getter (if we choose so). So this is only temporarily exposed.

* `hook/link.dart` scripts have to mark any input files it's output
  depends on in the `LinkOutput.addDependency`. If the set of inputs is
  the same but one input's file changed then a `hook/link.dart`
  - doesn't have to rerun if it just drops all assets or outputs them
    unchanged
  - has to re-run if it copies a input, changes it's contents, merge
    with other inputs, etc
  => A `hook/link.dart` should - just like a `hook/build.dart` add
    dependencies if the inputs are used to produce the output
  => Before this was somewhat hard-coded into the
  `package:native_assets_builder`.
Copy link

github-actions bot commented Oct 2, 2024

PR Health

Breaking changes ✔️
Package Change Current Version New Version Needed Version Looking good?
native_assets_cli Breaking 0.9.0-wip 0.9.0-wip 0.9.0-wip ✔️
Changelog Entry ✔️
Package Changed Files

Changes to files need to be accounted for in their respective changelogs.

API leaks ✔️

The following packages contain symbols visible in the public API, but not exported by the library. Export these symbols or remove them from your publicly visible API.

Package Leaked API symbols
License Headers ✔️
// Copyright (c) 2024, the Dart project authors. Please see the AUTHORS file
// for details. All rights reserved. Use of this source code is governed by a
// BSD-style license that can be found in the LICENSE file.
Files
no missing headers

All source files should start with a license header.

Unrelated files missing license headers
Files
pkgs/ffigen/example/libclang-example/generated_bindings.dart
pkgs/ffigen/example/shared_bindings/generate.dart
pkgs/ffigen/example/shared_bindings/lib/generated/a_gen.dart
pkgs/ffigen/example/shared_bindings/lib/generated/a_shared_b_gen.dart
pkgs/ffigen/example/shared_bindings/lib/generated/base_gen.dart
pkgs/ffigen/example/simple/generated_bindings.dart
pkgs/ffigen/lib/src/header_parser/clang_bindings/clang_bindings.dart
pkgs/ffigen/test/collision_tests/expected_bindings/_expected_decl_decl_collision_bindings.dart
pkgs/ffigen/test/collision_tests/expected_bindings/_expected_decl_symbol_address_collision_bindings.dart
pkgs/ffigen/test/collision_tests/expected_bindings/_expected_decl_type_name_collision_bindings.dart
pkgs/ffigen/test/collision_tests/expected_bindings/_expected_reserved_keyword_collision_bindings.dart
pkgs/ffigen/test/header_parser_tests/expected_bindings/_expected_comment_markup_bindings.dart
pkgs/ffigen/test/header_parser_tests/expected_bindings/_expected_dart_handle_bindings.dart
pkgs/ffigen/test/header_parser_tests/expected_bindings/_expected_enum_int_mimic_bindings.dart
pkgs/ffigen/test/header_parser_tests/expected_bindings/_expected_forward_decl_bindings.dart
pkgs/ffigen/test/header_parser_tests/expected_bindings/_expected_functions_bindings.dart
pkgs/ffigen/test/header_parser_tests/expected_bindings/_expected_imported_types_bindings.dart
pkgs/ffigen/test/header_parser_tests/expected_bindings/_expected_native_func_typedef_bindings.dart
pkgs/ffigen/test/header_parser_tests/expected_bindings/_expected_opaque_dependencies_bindings.dart
pkgs/ffigen/test/header_parser_tests/expected_bindings/_expected_packed_structs_bindings.dart
pkgs/ffigen/test/header_parser_tests/expected_bindings/_expected_regress_384_bindings.dart
pkgs/ffigen/test/header_parser_tests/expected_bindings/_expected_struct_fptr_fields_bindings.dart
pkgs/ffigen/test/header_parser_tests/expected_bindings/_expected_typedef_bindings.dart
pkgs/ffigen/test/header_parser_tests/expected_bindings/_expected_unions_bindings.dart
pkgs/ffigen/test/header_parser_tests/expected_bindings/_expected_varargs_bindings.dart
pkgs/ffigen/test/large_integration_tests/_expected_cjson_bindings.dart
pkgs/ffigen/test/large_integration_tests/_expected_libclang_bindings.dart
pkgs/ffigen/test/large_integration_tests/_expected_sqlite_bindings.dart
pkgs/ffigen/test/native_test/_expected_native_test_bindings.dart
pkgs/jni/lib/src/third_party/generated_bindings.dart
pkgs/jni/lib/src/third_party/global_env_extensions.dart
pkgs/jni/lib/src/third_party/jni_bindings_generated.dart
pkgs/jnigen/android_test_runner/lib/main.dart
pkgs/jnigen/example/in_app_java/lib/android_utils.dart
pkgs/jnigen/example/kotlin_plugin/example/lib/main.dart
pkgs/jnigen/example/kotlin_plugin/lib/kotlin_bindings.dart
pkgs/jnigen/example/kotlin_plugin/lib/kotlin_plugin.dart
pkgs/jnigen/example/pdfbox_plugin/lib/pdfbox_plugin.dart
pkgs/jnigen/example/pdfbox_plugin/lib/src/third_party/org/apache/pdfbox/pdmodel/PDDocument.dart
pkgs/jnigen/example/pdfbox_plugin/lib/src/third_party/org/apache/pdfbox/pdmodel/PDDocumentInformation.dart
pkgs/jnigen/example/pdfbox_plugin/lib/src/third_party/org/apache/pdfbox/pdmodel/_package.dart
pkgs/jnigen/example/pdfbox_plugin/lib/src/third_party/org/apache/pdfbox/text/PDFTextStripper.dart
pkgs/jnigen/example/pdfbox_plugin/lib/src/third_party/org/apache/pdfbox/text/_package.dart
pkgs/jnigen/lib/src/bindings/descriptor.dart
pkgs/jnigen/lib/src/elements/elements.g.dart
pkgs/jnigen/test/jackson_core_test/third_party/bindings/com/fasterxml/jackson/core/_package.dart
pkgs/jnigen/tool/command_runner.dart
pkgs/native_assets_builder/test_data/native_dynamic_linking/bin/native_dynamic_linking.dart
pkgs/swift2objc/lib/src/config.dart
pkgs/swift2objc/lib/src/generate_wrapper.dart
pkgs/swift2objc/lib/src/generator/_core/utils.dart
pkgs/swift2objc/lib/src/generator/generator.dart
pkgs/swift2objc/lib/src/generator/generators/class_generator.dart
pkgs/swift2objc/lib/src/parser/parsers/declaration_parsers/parse_initializer_declaration.dart
pkgs/swift2objc/lib/src/parser/parsers/declaration_parsers/parse_property_declaration.dart
pkgs/swift2objc/lib/src/transformer/_core/unique_namer.dart
pkgs/swift2objc/lib/src/transformer/_core/utils.dart
pkgs/swift2objc/lib/src/transformer/transformers/transform_property.dart
pkgs/swift2objc/test/unit/parse_initializer_param_test.dart
Package publish validation ✔️
Package Version Status
package:ffi 2.1.3 already published at pub.dev
package:ffigen 15.0.0-wip WIP (no publish necessary)
package:jni 0.12.0 already published at pub.dev
package:jnigen 0.12.0 already published at pub.dev
package:native_assets_cli 0.9.0-wip WIP (no publish necessary)
package:objective_c 2.1.0-wip WIP (no publish necessary)
package:swift2objc 0.0.1-wip WIP (no publish necessary)
package:swiftgen 0.0.1-wip WIP (no publish necessary)

Documentation at https://github.com/dart-lang/ecosystem/wiki/Publishing-automation.

@coveralls
Copy link

coveralls commented Oct 2, 2024

Coverage Status

coverage: 90.396% (-0.5%) from 90.865%
when pulling 01c97bc on custom-asset-types
into 0bf27dc on main.

@mkustermann
Copy link
Member Author

mkustermann commented Oct 2, 2024

Overall for users of these packages very little changes:

Hook writers

Before:

output.add(DataAsset(...));

now

output.dataAssets.add(DataAsset(...));
Bundling tools

Before:

    await build(
     ...
      supportedAssetTypes: [  // <-- this was optional and had auto-fallback to `[CodeAsset.type]`
        CodeAsset.type,
      ],
    );

now:

    await build(
     ...
      supportedAssetTypes: [  // <-- this is now required
        CodeAsset.type,
        DataAsset.type,
      ],
      buildValidator: (config, output) async => [
        ...await validateDataAssetBuildOutput(config, output),
        ...await validateCodeAssetBuildOutput(config, output),
      ],
      applicationAssetValidator: (assets) async => [
        ...await validateCodeAssetsInApplication(assets),
      ],
    );

=> This change mainly reflects that per-asset-type information is moved to the bundling tool which can use per-asset-type helpers to perform validation.

(see cl/387940 on how the roll into Dart SDK would look like)

Copy link
Collaborator

@dcharkes dcharkes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related issue:

High level LGTM. This PR is doing roughly what I expected to do with ExtensibleAsset and applies it also to data assets and code assets instead.

Also, thanks for the great PR description! ❤️

  • We start separating the per-asset-type specific code: native_assets_cli/lib/src/code_assets/* & native_assets_cli/lib/src/data_assets/*.
    => More refactorings will come that will make the core native_assets_cli/lib/src/* no longer have any per-asset-type specific things in them
    => Eventually the lib/src/data_assets and lib/src/code_assets could become their own packages (if we choose to do so) - similar to how flutter-specific assets can live in other packages.

One thing I noticed while reviewing the code is that we no longer run the validation for data assets and code assets inside the hook. This makes it harder to debug failures for users, because they cannot run the debugger on the hook with a config file. They'd have to run dartdev/flutter_tools and that doesn't enable them to step through their code.

Making the protocol not know about data assets and code assets should also then add a validation callback to build( and link(. And the validate methods for data assets and code assets should be made public in the non-internal.dart.

For adding assets and looping over assets, extension methods suffice, but for validation it does not.

Can we do better to support asset type specific validation? (Dependency injection? 😅 That would be super ugly. But that would also allow for us to register decode method for every asset type, fixing our problem with not having virtual statics. But dependency injection is too ugly.)

We need a solution for custom asset type validation inside hooks, and then we could use that as well the custom validation of code and data assets inside thebuild( and link( methods used by hook writers. We don't need a solution in this PR. But if we don't add a solution in this PR. I think the build( and link( methods should call the validators for code and data assets.

Hook writers

Before:

output.add(DataAsset(...));

now

output.dataAssets.add(DataAsset(...));

This is only true if we keep the validation for code and data assets in build( and link(. Otherwise users are going to have to call validateDataAssetBuildOutput and friends manually from the bottom of their build( method.

hook/link.dart scripts have to mark any input files it's output depends on in the LinkOutput.addDependency. If the set of inputs is the same but one input's file changed then a hook/link.dart

  • doesn't have to re-run if it just drops all assets or outputs them unchanged
  • has to re-run if it copies a input, changes it's contents, merge with other inputs, etc
    => A hook/link.dart should - just like a hook/build.dart add dependencies iff changing the input file contents should cause a re-run of the hook
    => Before this was somewhat hard-coded into the package:native_assets_builder.

This seems to be unrelated to this PR? Can it be moved to a separate PR? (And then we can also discuss pros/cons on that PR.)

const DeepCollectionEquality().equals(encoding, other.encoding);
}

const String _typeKey = 'type';
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

static const inside class?

pkgs/native_assets_cli/lib/src/code_assets/code_asset.dart Outdated Show resolved Hide resolved
pkgs/native_assets_cli/lib/src/code_assets/code_asset.dart Outdated Show resolved Hide resolved
pkgs/native_assets_cli/lib/src/code_assets/code_asset.dart Outdated Show resolved Hide resolved
pkgs/native_assets_cli/lib/src/api/build.dart Show resolved Hide resolved
@dcharkes
Copy link
Collaborator

dcharkes commented Oct 3, 2024

I think I mentioned this before and you were opposed to this: Should we have assetId (optional String?) and path (optional Uri) in EncodedAsset so that the typical errors with those can be caught in a generic way also for all asset types later to be added? (Should it be an error to emit both a data asset and a code asset with the same asset id? I think technically speaking it could work because there's going to be two different bundling mechanisms. But it might be rather confusing.)

@mkustermann
Copy link
Member Author

mkustermann commented Oct 3, 2024

I think I mentioned this before and you were opposed to this:

Correct.

Should we have assetId (optional String?) and path (optional Uri) in EncodedAsset so that the typical errors with those can be caught in a generic way also for all asset types later to be added? (Should it be an error to emit both a data asset and a code asset with the same asset id? I think technically speaking it could work because there's going to be two different bundling mechanisms. But it might be rather confusing.)

Most asset types have their own id namespace, some asset types may share an id namespace with other asset types, some assets will have no id concept at all.

Some asset types may have a file, some may have no file, some may have several files.

The reality is that different asset types are just that - different. Only type tells us which one it is, but each type may have different fields, some may be present in multiple asset types, but they represent different things and have different meaning (e.g. a CodeAsset having a file may want us to verify such a file is marked read-execute and are valid MachO/ELF/PE format, a DataAsset may also have a file that we want to verify it exists and is read-only, ...).

The verification about id namespaces is something a bundling tool will do, it decides whether code assets go into same directory as data asset (and cause name collision) or not, ...

@mkustermann
Copy link
Member Author

Firstly, thanks for all the comments, @dcharkes !

I've addressed some comments, resolved others with explanation and left a few open for next review round. PTAL

One thing I noticed while reviewing the code is that we no longer run the validation for data assets and code assets inside the hook. This makes it harder to debug failures for users, because they cannot run the debugger on the hook with a config file. They'd have to run dartdev/flutter_tools and that doesn't enable them to step through their code.

Making the protocol not know about data assets and code assets should also then add a validation callback to build( and link(. And the validate methods for data assets and code assets should be made public in the non-internal.dart.

For adding assets and looping over assets, extension methods suffice, but for validation it does not.

Can we do better to support asset type specific validation? (Dependency injection? 😅 That would be super ugly. But that would also allow for us to register decode method for every asset type, fixing our problem with not having virtual statics. But dependency injection is too ugly.)

We need a solution for custom asset type validation inside hooks, and then we could use that as well the custom validation of code and data assets inside thebuild( and link( methods used by hook writers. We don't need a solution in this PR. But if we don't add a solution in this PR. I think the build( and link( methods should call the validators for code and data assets.

I've filed #1628 with a suggestion how we can do this automatically without regression, but I'm not actually convinced we should do this.

hook/link.dart scripts have to mark any input files it's output depends on in the LinkOutput.addDependency. If the set of inputs is the same but one input's file changed then a hook/link.dart

doesn't have to re-run if it just drops all assets or outputs them unchanged
has to re-run if it copies a input, changes it's contents, merge with other inputs, etc
=> A hook/link.dart should - just like a hook/build.dart add dependencies iff changing the input file contents should cause a re-run of the hook
=> Before this was somewhat hard-coded into the package:native_assets_builder.

This seems to be unrelated to this PR? Can it be moved to a separate PR? (And then we can also discuss pros/cons on that PR.)

This is not unrelated to this PR because pkgs/native_assets_builder/lib/src/* no longer know anything about the specific asset types (by design). Which means they don't know a DataAsset exists or that it has a DataAsset.file. Which means it will no longer automatically add those as dependencies and use it to decide when to re-run linker hooks.

Now I had actually a version where I dependency-injected this from the upper layers (bundling tool): A fileDependencyExtractor closure that the lowest layer would use - but removed this again:

After thinking about this more I strongly disliked this approach. The natural way for any hook we invoke is for the hook to tell - via dependencies - the things it actually depends on (i.e. if those things change, it should rerun). This is how it works for hook/build.dart and this is how it should also work for hook/link.dart (IMHO). Didn't think this is very controversial (instead it's actually quite natural). The package:native_assets_builder is still responsible to re-run the linker hook if e.g. the number of inputs to the linker changes or the asset-ids of inputs change (or anything else in the json encoding of an asset).

@dcharkes
Copy link
Collaborator

dcharkes commented Oct 4, 2024

This is not unrelated to this PR because pkgs/native_assets_builder/lib/src/* no longer know anything about the specific asset types (by design). Which means they don't know a DataAsset exists or that it has a DataAsset.file. Which means it will no longer automatically add those as dependencies and use it to decide when to re-run linker hooks.

Ah right, because the system doesn't know anymore about assets having files.

I think this is rather a footgun for users, having to add the files as dependencies. You actually filed this and I addressed this last month:

This would become compounded if/when we're also going to model metadata as assets:

Assets having a file (or multiple files) on disk should be part of the system imho. One way to do so would be to recognize a file and files field and mandate these are absolute file paths/uris (mind you Windows on serialization/deserialization).

It would be asymmetric if the hooks do do caching correctly on the asset metadata (the json serialization of the object) but not on the files the asset is a metadata on. (The idea of merging the current metadata into assets signifies this difference, moving arbitrary json from the BuildOutput/BuildConfig to the contents of an asset file.)

@mkustermann
Copy link
Member Author

mkustermann commented Oct 4, 2024

I think this is rather a footgun for users, having to add the files as dependencies.

One has to be careful to manage dependencies, but this is also true for build hook writers, or native_toolchain_c, ... It's the nature of things.

You actually filed this and I addressed this last month: #1412

This would be automatically handled, because the json encoding of a DataAsset would change (the file uri of asset id id2 changed from b.txt to a.txt), therefore the link_config.json will change contents, which should re-run the linker automatically. It's unrelated to this topic.

This would become compounded if/when we're also going to model metadata as assets:

#1251

Not sure about this:

  • If the metadata has file paths in them and the file paths changes, then the metadata json changes and any dependencies should re-build automatically.
  • If any dependent packages use metadata of a dependency
    • and "consume"/"use" the file paths in the metadadata, they should report this as a dependency and will re-run if file content changes
    • and not "consume"/"use" the file paths in the metadata, then they don't report is as dependency, will not re-run if file content changes, which is ok because, well, it didn't "consume"/"use" the file content - it doesn't depend on it

@dcharkes
Copy link
Collaborator

dcharkes commented Oct 4, 2024

Right, so now we give asset-designers the flexibility to provide automatic caching or not be either writing a file or putting everything in the JSON encoding. I could totally make a NewCodeAsset that base64 encodes the contents of a dll in the json (and make the build output, link config, link output etc huge). And then it would automatically do caching.

The users of custom asset types now all of a sudden need to know implementation details of such asset types. I believe custom asset types should be mostly a concern of the introducer of the custom asset type (which is likely an embedder, or a dev who also writes a linker for such custom asset type).

Custom asset types having a file could be encapsulated, not visible by the users of the custom asset type (maybe it simply has a Future<Object> get myAssetContents.

I think correct caching should be the responsibility of the custom-asset-type introducer (together with the native_assets_builder).

Focusing on an optimization whether something is used/consumed creates the same question for not reading fields of the json encoding of an asset (and the current json-based implementation of the metadata feature flowing from one build hook to the next). If you want to focus on enabling users to have faster execution of their build based on whats used/consumed, then we should start recording what fields of the json encoding are read. It's very smelly to treat file contents differently than the json metadata.

After thinking about this more I strongly disliked this approach. The natural way for any hook we invoke is for the hook to tell - via dependencies - the things it actually depends on (i.e. if those things change, it should rerun). This is how it works for hook/build.dart and this is how it should also work for hook/link.dart (IMHO). Didn't think this is very controversial (instead it's actually quite natural). The package:native_assets_builder is still responsible to re-run the linker hook if e.g. the number of inputs to the linker changes or the asset-ids of inputs change (or anything else in the json encoding of an asset).

The dependencies that we report in build hooks are files that are not being passed in by the native assets builder, they live somewhere in the package root. The files that the native assets builder knows about influence caching (for example the dart sources).

@mkustermann
Copy link
Member Author

I think correct caching should be the responsibility of the custom-asset-type introducer (together with the native_assets_builder).

That can be easily done by making this modification:

diff --git a/pkgs/native_assets_cli/lib/src/code_assets/code_asset.dart b/pkgs/native_assets_cli/lib/src/code_assets/code_asset.dart
index 88576821..0c8acbc0 100644
--- a/pkgs/native_assets_cli/lib/src/code_assets/code_asset.dart
+++ b/pkgs/native_assets_cli/lib/src/code_assets/code_asset.dart
@@ -196,8 +196,13 @@ extension CodeAssetsBuildOutput on BuildOutput {
 }
 
 extension type BuildOutputCodeAssets(BuildOutput _output) {
-  void add(CodeAsset asset, {String? linkInPackage}) =>
-      _output.addEncodedAsset(asset.encode(), linkInPackage: linkInPackage);
+  void add(CodeAsset asset, {String? linkInPackage}) {
+    final file = asset.file;
+    if (file != null && asset.linkingMode is ...) {
+      _output.addDependency(file);
+    }
+    _output.addEncodedAsset(asset.encode(), linkInPackage: linkInPackage);
+  }
 
   void addAll(Iterable<CodeAsset> assets, {String? linkInPackage}) {
     for (final asset in assets) {
diff --git a/pkgs/native_assets_cli/lib/src/data_assets/data_asset.dart b/pkgs/native_assets_cli/lib/src/data_assets/data_asset.dart
index 98864398..e4721161 100644
--- a/pkgs/native_assets_cli/lib/src/data_assets/data_asset.dart
+++ b/pkgs/native_assets_cli/lib/src/data_assets/data_asset.dart
@@ -98,8 +98,10 @@ extension DataAssetsBuildOutput on BuildOutput {
 }
 
 extension type BuildOutputDataAssets(BuildOutput _output) {
-  void add(DataAsset asset, {String? linkInPackage}) =>
-      _output.addEncodedAsset(asset.encode(), linkInPackage: linkInPackage);
+  void add(DataAsset asset, {String? linkInPackage}) {
+    _output.addDependency(asset.file);
+    _output.addEncodedAsset(asset.encode(), linkInPackage: linkInPackage);
+  }
 
   void addAll(Iterable<DataAsset> assets, {String? linkInPackage}) {
     for (final asset in assets) {

In fact I had that at some point, but removed it again:

Firstly some tests were failing because they didn't expect those extra dependencies being reported.
Secondly it's an over-approximation: e.g. A linker that will consume all code assets by just adding them to the output doesn't need to re-run if the file contents change.
Thirdly it's a little troublesome with the dry-run, where files may not be required to exist

I'm open to making the above change in a follow-up CL if we want to do this over-approximation. But it's basically a choice of the per-asset-type code, not of the core build system.

Focusing on an optimization whether something is used/consumed creates the same question for not reading fields of the json encoding of an asset (and the current json-based implementation of the metadata feature flowing from one build hook to the next). If you want to focus on enabling users to have faster execution of their build based on whats used/consumed, then we should start recording what fields of the json encoding are read. It's very smelly to treat file contents differently than the json metadata.

I think this is not smelly at all. It's just a limitation in the dependency tracking system in the APIs & CLI protocol. Because of this limitation we over-approximate it and re-build more than possibly needed.

It's rooted in the fact that the existing dependency tracking machinery is based on files. One could expand this by adding a more fine-grained dependency tracking that allows expressing this:

output.addDependency(<file-that-was-consumed>);
output.addMetadataDependency(<description-of-what-subpart-of-metadata-was-consumed>);

but this seems like unnecessary complexity at this point.

@dcharkes
Copy link
Collaborator

dcharkes commented Oct 4, 2024

Now I had actually a version where I dependency-injected this from the upper layers (bundling tool): A fileDependencyExtractor closure that the lowest layer would use - but removed this again:

FWIW. I just realized that this approach doens't work. Custom asset types might be introduced by packages that consume such asset types in linkers and then output a data asset in the linker.

That can be easily done by making this modification:

Wait, I don't understand this, you don't want to add file of the asset you're adding as a dependency, but the file of the asset you're reading as a dependency. So, it should be rather a modification of linkConfig.codeAssets.all and linkConfig.dataAssets.all. Not the add function. Or am I missing something?

The only challenge there is that the config object doesn't know about the output currently. (We could make the config impl have a field that takes the output impl to allow for that.)

Adding all deps on accessing all seems a better solution (with the over approximation) to me. It's N custom-asset-type-introducers and M hook writers where I expect M to be larger than N.

I'm open to making the above change in a follow-up CL if we want to do this over-approximation. But it's basically a choice of the per-asset-type code, not of the core build system.

So all the existing link hooks will be broken in the meantime?

Also, the LinkConfig.encodedAssets needs a doc comment that the files should be added to dependencies if used (ditto for linkConfig.codeAssets and linkConfig.dataAssets if you don't add the .all dependency tracking in this PR.)

@dcharkes
Copy link
Collaborator

dcharkes commented Oct 4, 2024

So all the existing link hooks will be broken in the meantime?

Also, all link hooks where the SDK uses the new implementation (not tracking dependencies in native assets builder), and hooks use the old implementation (not tracking the dependencies manually in the hook) will be broken. Should we retract every individual package version of native_assets_cli to attend users?

This can all be avoided by baking in knowledge files/file into the protocol. IMHO this does not go against the goal of removing knowledge about specific asset types from the protocol / build system. It's simply saying that the protocol / build system has a built in facility for reporting files.

@mkustermann
Copy link
Member Author

Wait, I don't understand this, you don't want to add file of the asset you're adding as a dependency, but the file of the asset you're reading as a dependency. So, it should be rather a modification of linkConfig.codeAssets.all and linkConfig.dataAssets.all. Not the add function. Or am I missing something?

The only challenge there is that the config object doesn't know about the output currently. (We could make the config impl have a field that takes the output impl to allow for that.)

Sorry, yes, absolutely correct!
It needs to be in the consuming not producing end.

So all the existing link hooks will be broken in the meantime?

This sounds like the linking support is widely used and would cause huge breakage for users? In reality one can barely make linking scripts today because we don't have record-uses information that can be safely consumed yet. The format of the record use is being developed at this moment - nobody should depend on it atm.
Are we aware of any 3rd parties using linking support?

So I don't think this would cause any real issues in pratice. And even if it did break anyone: It's an experimental feature that we're trying to get towards 1.0. Now is the time to do breaking changes.

The set of changes performed in these refactorings are breaking - towards getting the system to 1.0. The PR as it's now will require hook writers to be precise about dependencies, I don't see that as a problem. It's backwards-compatible to later on over-approximate those dependencies by making the code & data asset specific APIs automatically add them (if we have evidence users have trouble managing dependencies precisely) - which I'm not entirely convinced we should do.

Copy link
Collaborator

@dcharkes dcharkes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm concerned about some of these changes, and have voiced my concerns.

I'd strongly prefer baking in knowledge about files into the json encoding for all the reasons mentioned above.

Please add a breaking change to pkgs/native_assets_cli/lib/src/model/build_output_CHANGELOG.md about link outputs now needing to contain their file dependencies from LinkConfig.encodedAssets (We don't have a link_config_CHANGELOG maybe rename to hook_config_CHANGELOG.) And add that we have no mitigation for people using older native_assets_cli's and link hooks because we expect no users. (Or alternatively as mitigation that we retract all older native_asstes_cli package versions...)

Also please add big screaming capslock (or not big scream capslock) documentation that people need to track their own dependencies in LinkConfig.codeAssets and LinkConfig.dataAssets.

@mkustermann
Copy link
Member Author

I'm concerned about some of these changes, and have voiced my concerns.
I'd strongly prefer baking in knowledge about files into the json encoding for all the reasons mentioned above.

Have added two concerns I'm aware of to the agenda of the upcoming interop meeting:

  • whether hook/link.dart should declare all files they consume explicitly (just like hook/build.dart scripts) or do so only for files that aren't in the json
  • whether verification should only be done in the bundling tool or also in the hooks

I can wait until merging the CL until we have discussed it in the next interop meeting.

…ment to `LinkConfig.{codeAssets,dataAssets}.all`
@mkustermann
Copy link
Member Author

Please add a breaking change to pkgs/native_assets_cli/lib/src/model/build_output_CHANGELOG.md about link outputs now needing to contain their file dependencies from LinkConfig.encodedAssets (We don't have a link_config_CHANGELOG maybe rename to hook_config_CHANGELOG.) And add that we have no mitigation for people using older native_assets_cli's and link hooks because we expect no users. (Or alternatively as mitigation that we retract all older native_asstes_cli package versions...)

Done

Also please add big screaming capslock (or not big scream capslock) documentation that people need to track their own dependencies in LinkConfig.codeAssets and LinkConfig.dataAssets.

Done

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants