Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AlchemiscalClient async+bulk for results, other methods; add request, response compression for large objects #150

Merged
merged 50 commits into from
Jun 29, 2023

Conversation

dotsdl
Copy link
Member

@dotsdl dotsdl commented Jun 23, 2023

This PR adds async/await methods to the AlchemiscaleBaseClient, as well as usage of these methods to the AlchemiscaleClient for use by users.

It also establishes the pattern for /bulk endpoints on API services, which are not strictly RESTful but do allow for much greater performance when requesting operations on many ScopedKeys in a single call.

This PR adds performance improvements using the above to:

  • AlchemiscaleClient.get_tasks_status
  • AlchemiscaleClient.set_tasks_status
  • AlchemiscaleClient.get_transformation_results
  • AlchemiscaleClient.get_transformation_failures
  • AlchemiscaleClient.get_task_results
  • AlchemiscaleClient.get_task_failures

This PR also adds use of gzip compression for large requests and responses between the AlchemiscaleBaseClient and the API services. For the AlchemiscaleClient, this optimization is by default applied to:

  • AlchemiscaleClient.create_network
  • AlchemiscaleClient.get_network
  • AlchemiscaleClient.get_transformation
  • AlchemiscaleClient.get_chemicalsystem
  • AlchemiscaleClient.get_transformation_results
  • AlchemiscaleClient.get_transformation_failures
  • AlchemiscaleClient.get_task_results
  • AlchemiscaleClient.get_task_failures

@dotsdl dotsdl linked an issue Jun 23, 2023 that may be closed by this pull request
@dotsdl dotsdl changed the title AlchemiscalClient async+bulk for results, other methods [WIP] AlchemiscalClient async+bulk for results, other methods Jun 23, 2023
@codecov-commenter
Copy link

codecov-commenter commented Jun 24, 2023

Codecov Report

Patch coverage: 76.20% and project coverage change: -1.38 ⚠️

Comparison is base (f761d36) 83.67% compared to head (f70a6c9) 82.30%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #150      +/-   ##
==========================================
- Coverage   83.67%   82.30%   -1.38%     
==========================================
  Files          21       21              
  Lines        2426     2656     +230     
==========================================
+ Hits         2030     2186     +156     
- Misses        396      470      +74     
Impacted Files Coverage Δ
alchemiscale/compute/service.py 81.62% <ø> (ø)
alchemiscale/interface/api.py 41.78% <25.00%> (-1.87%) ⬇️
alchemiscale/base/client.py 77.91% <71.73%> (-9.70%) ⬇️
alchemiscale/interface/client.py 92.27% <85.05%> (-3.71%) ⬇️
alchemiscale/base/api.py 86.48% <94.73%> (+1.38%) ⬆️
alchemiscale/compute/api.py 66.30% <100.00%> (+1.13%) ⬆️
alchemiscale/storage/statestore.py 94.12% <100.00%> (-0.06%) ⬇️

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

…alls

Now by default compress retrievals of AlchemicalNetwork, Transformation,
    and ChemicalSystem.

Also compress retrieval of ProtocolDAGResults.
@dotsdl dotsdl changed the title [WIP] AlchemiscalClient async+bulk for results, other methods [WIP] AlchemiscalClient async+bulk for results, other methods; add request, response compression for large objects Jun 27, 2023
@dotsdl
Copy link
Member Author

dotsdl commented Jun 27, 2023

@hmacdope almost done with this one! Could I get a review from you when you get the chance?

@dotsdl dotsdl requested a review from hmacdope June 27, 2023 02:12
Also, added `rich`-based progress bar to result retrieval.
Copy link
Collaborator

@hmacdope hmacdope left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! Few queries, see comments. :)

alchemiscale/interface/api.py Outdated Show resolved Hide resolved
token: TokenData = Depends(get_token_data_depends),
) -> List[Union[str, None]]:
status = TaskStatusEnum(status)
if status not in (
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be if status in ...? I could be missing something but I thought we didn't want to mutate state of waiting, invalid or deleted tasks, same with the HTTPException below, seems to suggest that status can be changed from terminal state invalid, deleted etc.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here status isn't the current status of the Tasks we want to set; it's the desired status. waiting, invalid, and deleted are all set-able by the user, at least under most conditions (e.g. going from 'complete' to 'waiting' isn't allowed by the underlying Neo4jStore method).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I understand, sorry about that.

except HTTPException:
tasks_updated.append(None)
else:
tasks_updated.extend(n4js.set_task_status([task_sk], status))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still uses one at a time set unlike /bulk/tasks/status/get.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct; we can optimize this further, but I think I'm running out of time on this one. Since we have more complex queries to deal with for status setting, I'd like to make that a future PR.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect! raise an issue, and happy to move on.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may have managed to get this one in. 😁


def get_tasks_status(
self, tasks: Union[ScopedKey, List[ScopedKey]]
self, tasks: List[ScopedKey], batch_size=1000
Copy link
Collaborator

@hmacdope hmacdope Jun 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love the batching ❤️

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🙏

alchemiscale/storage/statestore.py Outdated Show resolved Hide resolved
alchemiscale/tests/integration/interface/test_api.py Outdated Show resolved Hide resolved
alchemiscale/tests/integration/interface/test_api.py Outdated Show resolved Hide resolved
@@ -201,13 +329,18 @@ def _query_resource(self, resource, params=None):

@_retry
@_use_token
def _get_resource(self, resource, params=None):
def _get_resource(self, resource, params=None, compress=False):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did we want to compress by default here and below? Up to you. It seemed set to default to True on a lot of the interface/client.py methods.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thinking here is to keep compression on these private methods opt-in. Not all post and get calls will benefit from compression, especially for tiny requests/responses, so making it something we enable as the default on specific user-facing methods made the most sense to me.


return resp.json()

@staticmethod
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very cool.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Python 3.12 will have a itertools.batched we can just switch to.

Copy link
Collaborator

@hmacdope hmacdope left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! Few queries, see comments. :)

Also made set_tasks_status work as async/batch, same as get_tasks_status
@dotsdl dotsdl changed the title [WIP] AlchemiscalClient async+bulk for results, other methods; add request, response compression for large objects AlchemiscalClient async+bulk for results, other methods; add request, response compression for large objects Jun 27, 2023
@dotsdl dotsdl linked an issue Jun 27, 2023 that may be closed by this pull request
@dotsdl dotsdl merged commit dab275b into main Jun 29, 2023
3 checks passed
@dotsdl dotsdl deleted the user-async-batch branch June 29, 2023 07:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants