-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENCD-5439-optimize-metadata-endpoint #3462
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR approved with the understanding that code will be extended to refactor Annotation and PublicationData examples with tests.
src/encoded/reports/metadata.py
Outdated
if specified_type == 'Annotation': | ||
return _get_annotation_metadata(context, request) | ||
elif specified_type == 'PublicationData': | ||
return _get_publicationdata_metadata(context, request) | ||
else: | ||
return _get_metadata(context, request) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there tests for these?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still working on tests. Just PRed to give you a quick look.
src/encoded/tests/test_metadata.py
Outdated
'&files.status!=archived&files.biological_replicates=2' | ||
) | ||
mr = MetadataReport(dummy_request) | ||
assert mr._validate_request() is None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a "passed" validation? It's not very clear
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to return True. This validation actually redundant now with the @allowed_types decorator. But you could add some other validation logic.
with pytest.raises(HTTPBadRequest): | ||
mr._validate_request() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is clearly a failed _validate_request.
0423544
to
f6d22be
Compare
This comment has been minimized.
This comment has been minimized.
eacf33e
to
f6d22be
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don’t have a lot of feedback to give, but had a couple of thoughts, whatever they’re worth. But I’ll approve and leave the more detailed look to Phil.
('field', 'files.s3_uri') | ||
] | ||
for param in mr._get_field_params(): | ||
assert param in expected_field_params |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’m not sure how useful this test is, but it does seem more useful than the one it replaces, which compared two static lists to see we copied and pasted them correctly.
content_disposition='attachment;filename="%s"' % 'metadata.tsv' | ||
) | ||
|
||
|
||
@view_config(route_name='batch_download', request_method=('GET', 'POST')) | ||
def batch_download(context, request): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
batdh_download and metadata_tsv do a lot of similar things, but not the same things. But I had wondered if they could share some code, but it does seem kind of difficult to do, when the shared code might be a line or two here or there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to hoist batch_download endpoints over next. The file filtering stuff that was fixed for metadata will still be broken here for example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks fine to me. My comments are cosmetic none are required changes.
src/encoded/batch_download.py
Outdated
('No File Available', ['file.no_file_available']), | ||
('Restricted', ['files.restricted']) | ||
]) | ||
|
||
_tsv_mapping_publicationdata = OrderedDict([ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume you will move this to constants.py in the future..?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes.
@@ -0,0 +1,136 @@ | |||
from collections import OrderedDict |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is new and speed is the goal, is OrderedDict needed in this file and other files? Beginning in in Python 3.6, dict has ordering guaranteed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point. I'm going to leave it explicit for now, but should test for correctness and speed after removing these.
for value in simple_path_ids(experiment, path): | ||
if str(value) not in cell_value: | ||
cell_value.append(str(value)) | ||
if last and cell_value: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may be better to write len(last) == 0. While this is correct, it does require a little thinking/pausing especially to developer that work both front and back end. E.g, in Javascript, for last = []. if (last) == true. In Python, its False.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is pretty standard.
|
||
def make_file_cell(paths, file_): | ||
# Quick return if one level deep. | ||
if len(paths) == 1 and '.' not in paths[0]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You may want parenthesis after '.' - if len(paths) == 1 and ('.' not in paths[0])
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Think that adds unnecessary noise here.
# Quick return if one level deep. | ||
if len(paths) == 1 and '.' not in paths[0]: | ||
value = file_.get(paths[0], '') | ||
if isinstance(value, list): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe:
return value if not isinstance(value, list) else return ', '.join([str(v) for v in value])
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure two returns on a line is valid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo. I meant:
return value if not isinstance(value, list) else ', '.join([str(v) for v in value])
|
||
|
||
def file_matches_file_params(file_, positive_file_param_list): | ||
# Expects file_param_list where 'files.' has been |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pardon me if I got American and British English confused. 'files.' is the British way. The American way is "file."; with the double quotes since the word has multiple characters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is specifically referring to file properties in experiments which are all under files.
. So like files.accession
and files.status
.
file_prop_value = list(simple_path_ids(file_, k)) | ||
else: | ||
file_prop_value = file_.get(k) | ||
if not file_prop_value: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe-
if not file_prop_value or str(file_prop_value) not in v:
return False
if isinstance(file_prop_value, list):
return any([str(x) in v for x in file_prop_value])
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would short circuit the list check.
|
||
|
||
def group_audits_by_files_and_type(audits): | ||
grouped_file_audits = defaultdict(lambda: defaultdict(list)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea. to use defaultdict.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A lot faster to parse the audits once per experiment rather than crawl through them for every file in every experiment.
self.param_list['@id'].extend( | ||
self.request.json.get('elements', []) | ||
) | ||
except ValueError: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you need to comment why ValueError is eaten rather than be allowed to be thrown?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is lifted over from old code. I left it in because I was concerned the ValueError is coming from trying to access .json
on request
when it doesn't exist. Could probably clean this up.
for column, fields in self.experiment_column_to_fields_mapping.items() | ||
} | ||
|
||
def _get_file_data(self, file_): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume "file_" as in "you cannot use 'file' so using 'file_'" ..?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep.
7bd6560
33fe000
to
afe53bb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mostly used this as an educational exercise, and it looks really good — way more organized and understandable.
50fa5a1
to
5f4a85a
Compare
This should be ready as long as BDD tests pass. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This branch, updated to the latest, passed BDD locally for me, so I’m going to pass it here.
No description provided.