Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create Semsimian object with concerned predicates only. #631

Merged
merged 44 commits into from
Aug 18, 2023

Conversation

hrshdhgd
Copy link
Collaborator

@hrshdhgd hrshdhgd commented Aug 7, 2023

  • Create semsimian object only with concerned predicates
  • Pydantic version < 2
  • Semsimian anchored at 0.2.0
  • Add in-memory caching for better efficiency
  • termset-similarity command returns TermSetPairwiseSimilarity object

@hrshdhgd hrshdhgd requested a review from cmungall August 7, 2023 15:57
@hrshdhgd hrshdhgd marked this pull request as ready for review August 7, 2023 15:57
@hrshdhgd hrshdhgd marked this pull request as draft August 7, 2023 16:09
@hrshdhgd hrshdhgd removed the request for review from cmungall August 7, 2023 16:09
pyproject.toml Outdated Show resolved Hide resolved
Copy link
Collaborator

@cmungall cmungall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pinned too tightly

tox.ini Show resolved Hide resolved
Copy link
Collaborator

@cmungall cmungall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the logic is quite right. Consider the case where the client does:

sim1 = adapter.pairwise_similarity(x,y, predicates=[IS_A])
sim2 = adapter.pairwise_similarity(x,y, predicates=[IS_A, PART_OF])
<some kind of comparison of the two...>

I believe that sem2 will give the same result. It will silently use the same cached filtered closure, and the user will be none the wiser.

If we do thing that the standard use case of semsimian is that you will never want to switch predicate sets in one session (not ideal but acceptable in the short term), then it is vital that the above code will fail fast on the second call and inform the user that the cached object is already frozen on the IS_A-only closure.

However, I think a more standard solution is that you have a cache of semsimian objects keyed by the tuple of the ordered list of predicates (ordered to avoid unnecessary re-computation as [IS_A, PART_OF] is the same as [PART_OF, IS_A]).

e.g. one of the object attributes would be

    semsimian_cache: Dict[Tuple, Semsimian] = None

Then create_pairwise_similarity_output_object would turn predicates to a tuple tuple(sorted(predicates)) and use that as a key.

There is a danger here that the client could exhaust memory by iterating through all combos of 100 predicates in uberon... but I think that is acceptable for now (later on we could have a system that did more active cache management).

@codecov-commenter
Copy link

codecov-commenter commented Aug 7, 2023

Codecov Report

Patch coverage: 82.35% and project coverage change: -0.04% ⚠️

Comparison is base (4a306aa) 77.08% compared to head (ed804dd) 77.04%.
Report is 1 commits behind head on main.

❗ Your organization is not using the GitHub App Integration. As a result you may experience degraded service beginning May 15th. Please install the Github App Integration for your organization. Read more.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #631      +/-   ##
==========================================
- Coverage   77.08%   77.04%   -0.04%     
==========================================
  Files         245      247       +2     
  Lines       28415    28540     +125     
==========================================
+ Hits        21903    21990      +87     
- Misses       6512     6550      +38     
Files Changed Coverage Δ
...lementations/semsimian/semsimian_implementation.py 85.88% <70.00%> (-9.44%) ⬇️
tests/test_implementations/test_pronto.py 90.93% <100.00%> (-0.86%) ⬇️
...t_implementations/test_semsimian_implementation.py 87.80% <100.00%> (+2.29%) ⬆️

... and 15 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@caufieldjh
Copy link
Collaborator

Because changes in this branch are necessary for Monarch use cases and will not be impacted by Windows-specific errors, I'm going to consider this state sufficient for an RC release and will open a fresh PR for the Windows issue.

@caufieldjh caufieldjh merged commit 529cdb8 into main Aug 18, 2023
5 checks passed
@caufieldjh caufieldjh deleted the semsim-obj-create branch August 18, 2023 19:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants