Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENCD-5439-optimize-metadata-endpoint #3462

Merged
merged 72 commits into from
Aug 7, 2020
Merged

Conversation

keenangraham
Copy link
Contributor

No description provided.

@keenangraham keenangraham requested a review from hitz July 31, 2020 22:46
hitz
hitz previously approved these changes Aug 1, 2020
Copy link
Member

@hitz hitz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR approved with the understanding that code will be extended to refactor Annotation and PublicationData examples with tests.

Comment on lines 354 to 552
if specified_type == 'Annotation':
return _get_annotation_metadata(context, request)
elif specified_type == 'PublicationData':
return _get_publicationdata_metadata(context, request)
else:
return _get_metadata(context, request)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there tests for these?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still working on tests. Just PRed to give you a quick look.

'&files.status!=archived&files.biological_replicates=2'
)
mr = MetadataReport(dummy_request)
assert mr._validate_request() is None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a "passed" validation? It's not very clear

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to return True. This validation actually redundant now with the @allowed_types decorator. But you could add some other validation logic.

Comment on lines +955 to +1066
with pytest.raises(HTTPBadRequest):
mr._validate_request()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is clearly a failed _validate_request.

@keenangraham keenangraham force-pushed the ENCD-5439-optimize-metadata-endpoint branch 6 times, most recently from 0423544 to f6d22be Compare August 5, 2020 19:53
@keenangraham

This comment has been minimized.

@keenangraham keenangraham force-pushed the ENCD-5439-optimize-metadata-endpoint branch 2 times, most recently from eacf33e to f6d22be Compare August 5, 2020 20:45
forresttanaka
forresttanaka previously approved these changes Aug 5, 2020
Copy link
Contributor

@forresttanaka forresttanaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don’t have a lot of feedback to give, but had a couple of thoughts, whatever they’re worth. But I’ll approve and leave the more detailed look to Phil.

('field', 'files.s3_uri')
]
for param in mr._get_field_params():
assert param in expected_field_params
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m not sure how useful this test is, but it does seem more useful than the one it replaces, which compared two static lists to see we copied and pasted them correctly.

content_disposition='attachment;filename="%s"' % 'metadata.tsv'
)


@view_config(route_name='batch_download', request_method=('GET', 'POST'))
def batch_download(context, request):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

batdh_download and metadata_tsv do a lot of similar things, but not the same things. But I had wondered if they could share some code, but it does seem kind of difficult to do, when the shared code might be a line or two here or there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to hoist batch_download endpoints over next. The file filtering stuff that was fixed for metadata will still be broken here for example.

zoldello
zoldello previously approved these changes Aug 5, 2020
Copy link
Contributor

@zoldello zoldello left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks fine to me. My comments are cosmetic none are required changes.

('No File Available', ['file.no_file_available']),
('Restricted', ['files.restricted'])
])

_tsv_mapping_publicationdata = OrderedDict([
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume you will move this to constants.py in the future..?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

@@ -0,0 +1,136 @@
from collections import OrderedDict
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is new and speed is the goal, is OrderedDict needed in this file and other files? Beginning in in Python 3.6, dict has ordering guaranteed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point. I'm going to leave it explicit for now, but should test for correctness and speed after removing these.

for value in simple_path_ids(experiment, path):
if str(value) not in cell_value:
cell_value.append(str(value))
if last and cell_value:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It may be better to write len(last) == 0. While this is correct, it does require a little thinking/pausing especially to developer that work both front and back end. E.g, in Javascript, for last = []. if (last) == true. In Python, its False.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is pretty standard.


def make_file_cell(paths, file_):
# Quick return if one level deep.
if len(paths) == 1 and '.' not in paths[0]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may want parenthesis after '.' - if len(paths) == 1 and ('.' not in paths[0])

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think that adds unnecessary noise here.

# Quick return if one level deep.
if len(paths) == 1 and '.' not in paths[0]:
value = file_.get(paths[0], '')
if isinstance(value, list):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe:
return value if not isinstance(value, list) else return ', '.join([str(v) for v in value])

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure two returns on a line is valid.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo. I meant:
return value if not isinstance(value, list) else ', '.join([str(v) for v in value])



def file_matches_file_params(file_, positive_file_param_list):
# Expects file_param_list where 'files.' has been
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pardon me if I got American and British English confused. 'files.' is the British way. The American way is "file."; with the double quotes since the word has multiple characters.

Copy link
Contributor Author

@keenangraham keenangraham Aug 5, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is specifically referring to file properties in experiments which are all under files.. So like files.accession and files.status.

file_prop_value = list(simple_path_ids(file_, k))
else:
file_prop_value = file_.get(k)
if not file_prop_value:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe-

    if not file_prop_value or str(file_prop_value) not in v:
        return False
    if isinstance(file_prop_value, list):
        return any([str(x) in v for x in file_prop_value])

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would short circuit the list check.



def group_audits_by_files_and_type(audits):
grouped_file_audits = defaultdict(lambda: defaultdict(list))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. to use defaultdict.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A lot faster to parse the audits once per experiment rather than crawl through them for every file in every experiment.

self.param_list['@id'].extend(
self.request.json.get('elements', [])
)
except ValueError:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need to comment why ValueError is eaten rather than be allowed to be thrown?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is lifted over from old code. I left it in because I was concerned the ValueError is coming from trying to access .json on request when it doesn't exist. Could probably clean this up.

for column, fields in self.experiment_column_to_fields_mapping.items()
}

def _get_file_data(self, file_):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume "file_" as in "you cannot use 'file' so using 'file_'" ..?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep.

forresttanaka
forresttanaka previously approved these changes Aug 6, 2020
Copy link
Contributor

@forresttanaka forresttanaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mostly used this as an educational exercise, and it looks really good — way more organized and understandable.

@keenangraham keenangraham force-pushed the ENCD-5439-optimize-metadata-endpoint branch from 50fa5a1 to 5f4a85a Compare August 6, 2020 22:51
@keenangraham
Copy link
Contributor Author

This should be ready as long as BDD tests pass.

Copy link
Contributor

@forresttanaka forresttanaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This branch, updated to the latest, passed BDD locally for me, so I’m going to pass it here.

@gabdank gabdank merged commit a9c2066 into dev Aug 7, 2020
@keenangraham keenangraham deleted the ENCD-5439-optimize-metadata-endpoint branch August 7, 2020 00:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants