ENCD-5439-optimize-metadata-endpoint #3462

keenangraham · 2020-07-31T22:46:22Z

No description provided.

hitz

PR approved with the understanding that code will be extended to refactor Annotation and PublicationData examples with tests.

hitz · 2020-08-01T00:18:08Z

src/encoded/reports/metadata.py

+    if specified_type == 'Annotation':
+        return _get_annotation_metadata(context, request)
+    elif specified_type == 'PublicationData':
+        return _get_publicationdata_metadata(context, request)
+    else:
+        return _get_metadata(context, request)


Are there tests for these?

Still working on tests. Just PRed to give you a quick look.

hitz · 2020-08-01T00:22:10Z

src/encoded/tests/test_metadata.py

+        '&files.status!=archived&files.biological_replicates=2'
+    )
+    mr = MetadataReport(dummy_request)
+    assert mr._validate_request() is None


This is a "passed" validation? It's not very clear

Updated to return True. This validation actually redundant now with the @allowed_types decorator. But you could add some other validation logic.

hitz · 2020-08-01T00:22:37Z

src/encoded/tests/test_metadata.py

+    with pytest.raises(HTTPBadRequest):
+        mr._validate_request()


This is clearly a failed _validate_request.

src/encoded/reports/metadata.py

src/encoded/tests/features/conftest.py

forresttanaka

I don’t have a lot of feedback to give, but had a couple of thoughts, whatever they’re worth. But I’ll approve and leave the more detailed look to Phil.

forresttanaka · 2020-08-05T21:21:48Z

src/encoded/tests/test_metadata.py

+        ('field', 'files.s3_uri')
+    ]
+    for param in mr._get_field_params():
+        assert param in expected_field_params


I’m not sure how useful this test is, but it does seem more useful than the one it replaces, which compared two static lists to see we copied and pasted them correctly.

forresttanaka · 2020-08-05T21:31:03Z

src/encoded/batch_download.py

-        content_disposition='attachment;filename="%s"' % 'metadata.tsv'
-    )
-
-
 @view_config(route_name='batch_download', request_method=('GET', 'POST'))
 def batch_download(context, request):


batdh_download and metadata_tsv do a lot of similar things, but not the same things. But I had wondered if they could share some code, but it does seem kind of difficult to do, when the shared code might be a line or two here or there.

I think we need to hoist batch_download endpoints over next. The file filtering stuff that was fixed for metadata will still be broken here for example.

zoldello

It looks fine to me. My comments are cosmetic none are required changes.

zoldello · 2020-08-05T21:30:17Z

src/encoded/batch_download.py

-    ('No File Available', ['file.no_file_available']),
-    ('Restricted', ['files.restricted'])
-])
-
 _tsv_mapping_publicationdata = OrderedDict([


I assume you will move this to constants.py in the future..?

zoldello · 2020-08-05T21:33:36Z

src/encoded/reports/constants.py

@@ -0,0 +1,136 @@
+from collections import OrderedDict


Since this is new and speed is the goal, is OrderedDict needed in this file and other files? Beginning in in Python 3.6, dict has ordering guaranteed.

That's a good point. I'm going to leave it explicit for now, but should test for correctness and speed after removing these.

zoldello · 2020-08-05T21:45:30Z

src/encoded/reports/metadata.py

+        for value in simple_path_ids(experiment, path):
+            if str(value) not in cell_value:
+                cell_value.append(str(value))
+        if last and cell_value:


It may be better to write len(last) == 0. While this is correct, it does require a little thinking/pausing especially to developer that work both front and back end. E.g, in Javascript, for last = []. if (last) == true. In Python, its False.

I think this is pretty standard.

zoldello · 2020-08-05T21:46:49Z

src/encoded/reports/metadata.py

+
+def make_file_cell(paths, file_):
+    # Quick return if one level deep.
+    if len(paths) == 1 and '.' not in paths[0]:


You may want parenthesis after '.' - if len(paths) == 1 and ('.' not in paths[0])

Think that adds unnecessary noise here.

zoldello · 2020-08-05T21:48:19Z

src/encoded/reports/metadata.py

+    # Quick return if one level deep.
+    if len(paths) == 1 and '.' not in paths[0]:
+        value = file_.get(paths[0], '')
+        if isinstance(value, list):


Maybe:
return value if not isinstance(value, list) else return ', '.join([str(v) for v in value])

Not sure two returns on a line is valid.

Typo. I meant:
return value if not isinstance(value, list) else ', '.join([str(v) for v in value])

zoldello · 2020-08-05T21:50:26Z

src/encoded/reports/metadata.py

+
+
+def file_matches_file_params(file_, positive_file_param_list):
+    # Expects file_param_list where 'files.' has been


Pardon me if I got American and British English confused. 'files.' is the British way. The American way is "file."; with the double quotes since the word has multiple characters.

This is specifically referring to file properties in experiments which are all under files.. So like files.accession and files.status.

zoldello · 2020-08-05T21:53:09Z

src/encoded/reports/metadata.py

+            file_prop_value = list(simple_path_ids(file_, k))
+        else:
+            file_prop_value = file_.get(k)
+        if not file_prop_value:


Maybe-

if not file_prop_value or str(file_prop_value) not in v: return False if isinstance(file_prop_value, list): return any([str(x) in v for x in file_prop_value])

This would short circuit the list check.

zoldello · 2020-08-05T21:56:10Z

src/encoded/reports/metadata.py

+
+
+def group_audits_by_files_and_type(audits):
+    grouped_file_audits = defaultdict(lambda: defaultdict(list))


Good idea. to use defaultdict.

A lot faster to parse the audits once per experiment rather than crawl through them for every file in every experiment.

zoldello · 2020-08-05T21:59:36Z

src/encoded/reports/metadata.py

+            self.param_list['@id'].extend(
+                self.request.json.get('elements', [])
+            )
+        except ValueError:


Do you need to comment why ValueError is eaten rather than be allowed to be thrown?

This is lifted over from old code. I left it in because I was concerned the ValueError is coming from trying to access .json on request when it doesn't exist. Could probably clean this up.

zoldello · 2020-08-05T22:03:23Z

src/encoded/reports/metadata.py

+            for column, fields in self.experiment_column_to_fields_mapping.items()
+        }
+
+    def _get_file_data(self, file_):


I assume "file_" as in "you cannot use 'file' so using 'file_'" ..?

forresttanaka

I mostly used this as an educational exercise, and it looks really good — way more organized and understandable.

keenangraham · 2020-08-06T23:45:17Z

This should be ready as long as BDD tests pass.

forresttanaka

This branch, updated to the latest, passed BDD locally for me, so I’m going to pass it here.

keenangraham requested a review from hitz July 31, 2020 22:46

hitz previously approved these changes Aug 1, 2020

View reviewed changes

keenangraham dismissed hitz’s stale review via fc8a470 August 1, 2020 00:45

keenangraham force-pushed the ENCD-5439-optimize-metadata-endpoint branch 6 times, most recently from 0423544 to f6d22be Compare August 5, 2020 19:53

keenangraham requested review from forresttanaka and zoldello August 5, 2020 20:30

keenangraham commented Aug 5, 2020

View reviewed changes

src/encoded/reports/metadata.py Show resolved Hide resolved

This comment has been minimized.

Sign in to view

keenangraham commented Aug 5, 2020

View reviewed changes

src/encoded/tests/features/conftest.py Outdated Show resolved Hide resolved

keenangraham force-pushed the ENCD-5439-optimize-metadata-endpoint branch 2 times, most recently from eacf33e to f6d22be Compare August 5, 2020 20:45

forresttanaka previously approved these changes Aug 5, 2020

View reviewed changes

zoldello previously approved these changes Aug 5, 2020

View reviewed changes

keenangraham dismissed stale reviews from zoldello and forresttanaka via 7bd6560 August 5, 2020 23:13

keenangraham force-pushed the ENCD-5439-optimize-metadata-endpoint branch from 33fe000 to afe53bb Compare August 6, 2020 20:06

keenangraham requested a review from forresttanaka August 6, 2020 20:08

forresttanaka previously approved these changes Aug 6, 2020

View reviewed changes

keenangraham dismissed forresttanaka’s stale review via ea74dd4 August 6, 2020 21:24

keenangraham added 6 commits August 6, 2020 15:51

Add test for all metadata

e339809

Group audits by file

e4e2e5e

Add back other audits

5440ff4

Remove prints

def0eb0

Fix name

4b0b846

Stream results

6e04f58

keenangraham added 23 commits August 6, 2020 15:51

Add AnnotationMetadataReport

0371fbd

Remove unused

594ce5f

Mark indexing tests

b96cfae

Add BatchedSearchGenerator

15b787d

Add make_batches_from_batch_params and test

50dde28

Add batch test

ef428b2

Add build request and batched_values and tests

9dcaf3c

Yield results

72849b0

Add PublicationDataMetadataReport

5bf9844

Replace takes arg

988d511

Filter file params from experiment querystring

a56bcee

Update

642c895

Remove old endopoint

eb53df1

Use @id in URL instead of dataset

c2c4c85

Update tests

3ec71ec

Add tests

b95227d

Update

e17af3e

Update and add tests

4e30ef7

Add test

fe075ba

Update tests

800d272

Put quotes around metadata URL

77b2dae

Update tests with quotes

00c1b60

Update expected_metadata

5f4a85a

keenangraham force-pushed the ENCD-5439-optimize-metadata-endpoint branch from 50fa5a1 to 5f4a85a Compare August 6, 2020 22:51

Remove unused

3e985e2

keenangraham requested a review from forresttanaka August 6, 2020 23:45

forresttanaka approved these changes Aug 7, 2020

View reviewed changes

gabdank merged commit a9c2066 into dev Aug 7, 2020

keenangraham deleted the ENCD-5439-optimize-metadata-endpoint branch August 7, 2020 00:48



		def file_matches_file_params(file_, positive_file_param_list):
		# Expects file_param_list where 'files.' has been



		def group_audits_by_files_and_type(audits):
		grouped_file_audits = defaultdict(lambda: defaultdict(list))

ENCD-5439-optimize-metadata-endpoint #3462

ENCD-5439-optimize-metadata-endpoint #3462

Conversation

keenangraham commented Jul 31, 2020

hitz left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

This comment has been minimized.

forresttanaka left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zoldello left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

keenangraham Aug 5, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

forresttanaka left a comment

Choose a reason for hiding this comment

keenangraham commented Aug 6, 2020

forresttanaka left a comment

Choose a reason for hiding this comment

keenangraham Aug 5, 2020 •

edited

Loading