Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENCD-5439-optimize-metadata-endpoint #3462

Merged
merged 72 commits into from
Aug 7, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
72 commits
Select commit Hold shift + click to select a range
e339809
Add test for all metadata
keenangraham Jul 25, 2020
e4e2e5e
Group audits by file
keenangraham Jul 25, 2020
5440ff4
Add back other audits
keenangraham Jul 25, 2020
def0eb0
Remove prints
keenangraham Jul 25, 2020
4b0b846
Fix name
keenangraham Jul 25, 2020
6e04f58
Stream results
keenangraham Jul 28, 2020
f4b89de
Split metadata into class
keenangraham Jul 29, 2020
8b3fb82
Use quick return and drop internal_audits
keenangraham Jul 29, 2020
8378f43
Add index_of column
keenangraham Jul 29, 2020
1063769
Move metadata tests to separate file
keenangraham Jul 29, 2020
887dba5
Move metadata mappings to constants
keenangraham Jul 29, 2020
c791d23
Block endpoint to allowed types
keenangraham Jul 29, 2020
5f32b68
Update decorator
keenangraham Jul 29, 2020
412b416
Pass in right order
keenangraham Jul 29, 2020
babf308
Add routing for annotation and publication data metadata
keenangraham Jul 29, 2020
8c38b6f
Fix name
keenangraham Jul 29, 2020
f23a57a
Return default metadata
keenangraham Jul 29, 2020
b3838b0
Remove unused
keenangraham Jul 29, 2020
a4f4217
Remove unused
keenangraham Jul 29, 2020
90cf178
Update variable
keenangraham Jul 29, 2020
c4de425
Remove import
keenangraham Jul 30, 2020
58c45ec
Add test for allowed_types decorator
keenangraham Jul 30, 2020
494aace
Update tests
keenangraham Jul 30, 2020
ceeb41c
Fix file matches file param
keenangraham Jul 31, 2020
3124f8c
Make file filter work with nested paths
keenangraham Jul 31, 2020
483e2bb
Strip files. once at start
keenangraham Jul 31, 2020
6155b2a
Replace with nothing
keenangraham Jul 31, 2020
51f946d
Ignore negation in filtering
keenangraham Jul 31, 2020
f59cd57
Add comment
keenangraham Jul 31, 2020
d2ecd64
Call positive_file_param_list
keenangraham Jul 31, 2020
653caac
Add test
keenangraham Jul 31, 2020
610f3f2
Update test
keenangraham Jul 31, 2020
e81412f
Add MetadataReport tests
keenangraham Jul 31, 2020
efd8651
Add more tests
keenangraham Jul 31, 2020
022c298
Add more tests
keenangraham Jul 31, 2020
7c266d1
Make validate return more clear
keenangraham Aug 1, 2020
7d2ef4e
Add tests for build_new_request and get_search_results
keenangraham Aug 3, 2020
22a2573
Test should_not_report_file
keenangraham Aug 3, 2020
a44b030
Test get_experiment_data
keenangraham Aug 3, 2020
7f962c3
Add tests for get experiment, file, audit data
keenangraham Aug 3, 2020
c2fca96
Update metadata
keenangraham Aug 3, 2020
fd0c01a
Use index_workbook before testapp
keenangraham Aug 3, 2020
7f2cb09
Try running in nonindexing
keenangraham Aug 4, 2020
23d3895
Add sorted_row test
keenangraham Aug 4, 2020
fc74959
Add test for report generator and annotation/publication_data views
keenangraham Aug 4, 2020
603b2b1
Assert status
keenangraham Aug 4, 2020
a472f93
Put annotation and publication_data in test harness
keenangraham Aug 5, 2020
af2d877
Make columns overridable
keenangraham Aug 5, 2020
0371fbd
Add AnnotationMetadataReport
keenangraham Aug 5, 2020
594ce5f
Remove unused
keenangraham Aug 5, 2020
b96cfae
Mark indexing tests
keenangraham Aug 5, 2020
15b787d
Add BatchedSearchGenerator
keenangraham Aug 5, 2020
50dde28
Add make_batches_from_batch_params and test
keenangraham Aug 5, 2020
ef428b2
Add batch test
keenangraham Aug 5, 2020
9dcaf3c
Add build request and batched_values and tests
keenangraham Aug 6, 2020
72849b0
Yield results
keenangraham Aug 6, 2020
5bf9844
Add PublicationDataMetadataReport
keenangraham Aug 6, 2020
988d511
Replace takes arg
keenangraham Aug 6, 2020
a56bcee
Filter file params from experiment querystring
keenangraham Aug 6, 2020
642c895
Update
keenangraham Aug 6, 2020
eb53df1
Remove old endopoint
keenangraham Aug 6, 2020
c2c4c85
Use @id in URL instead of dataset
keenangraham Aug 6, 2020
3ec71ec
Update tests
keenangraham Aug 6, 2020
b95227d
Add tests
keenangraham Aug 6, 2020
e17af3e
Update
keenangraham Aug 6, 2020
4e30ef7
Update and add tests
keenangraham Aug 6, 2020
fe075ba
Add test
keenangraham Aug 6, 2020
800d272
Update tests
keenangraham Aug 6, 2020
77b2dae
Put quotes around metadata URL
keenangraham Aug 6, 2020
00c1b60
Update tests with quotes
keenangraham Aug 6, 2020
5f4a85a
Update expected_metadata
keenangraham Aug 6, 2020
3e985e2
Remove unused
keenangraham Aug 6, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions src/encoded/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -224,6 +224,7 @@ def main(global_config, **local_config):
config.include('.types')
config.include('.root')
config.include('.batch_download')
config.include('.reports.metadata')
config.include('.visualization')

if 'elasticsearch.server' in config.registry.settings:
Expand Down
495 changes: 5 additions & 490 deletions src/encoded/batch_download.py

Large diffs are not rendered by default.

Empty file added src/encoded/reports/__init__.py
Empty file.
166 changes: 166 additions & 0 deletions src/encoded/reports/constants.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,166 @@
from collections import OrderedDict
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is new and speed is the goal, is OrderedDict needed in this file and other files? Beginning in in Python 3.6, dict has ordering guaranteed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point. I'm going to leave it explicit for now, but should test for correctness and speed after removing these.



METADATA_ALLOWED_TYPES = [
'Experiment',
'Annotation',
'FunctionalCharacterizationExperiment',
'PublicationData',
]


METADATA_COLUMN_TO_FIELDS_MAPPING = OrderedDict(
[
('File accession', ['files.title']),
('File format', ['files.file_type']),
('File type', ['files.file_format']),
('File format type', ['files.file_format_type']),
('Output type', ['files.output_type']),
('File assembly', ['files.assembly']),
('Experiment accession', ['accession']),
('Assay', ['assay_title']),
('Biosample term id', ['biosample_ontology.term_id']),
('Biosample term name', ['biosample_ontology.term_name']),
('Biosample type', ['biosample_ontology.classification']),
('Biosample organism', ['replicates.library.biosample.organism.scientific_name']),
('Biosample treatments', ['replicates.library.biosample.treatments.treatment_term_name']),
(
'Biosample treatments amount',
[
'replicates.library.biosample.treatments.amount',
'replicates.library.biosample.treatments.amount_units'
]
),
(
'Biosample treatments duration',
[
'replicates.library.biosample.treatments.duration',
'replicates.library.biosample.treatments.duration_units'
]
),
('Biosample genetic modifications methods', ['replicates.library.biosample.applied_modifications.method']),
('Biosample genetic modifications categories', ['replicates.library.biosample.applied_modifications.category']),
('Biosample genetic modifications targets', ['replicates.library.biosample.applied_modifications.modified_site_by_target_id']),
('Biosample genetic modifications gene targets', ['replicates.library.biosample.applied_modifications.modified_site_by_gene_id']),
(
'Biosample genetic modifications site coordinates', [
'replicates.library.biosample.applied_modifications.modified_site_by_coordinates.assembly',
'replicates.library.biosample.applied_modifications.modified_site_by_coordinates.chromosome',
'replicates.library.biosample.applied_modifications.modified_site_by_coordinates.start',
'replicates.library.biosample.applied_modifications.modified_site_by_coordinates.end'
]
),
('Biosample genetic modifications zygosity', ['replicates.library.biosample.applied_modifications.zygosity']),
('Experiment target', ['target.name']),
('Library made from', ['replicates.library.nucleic_acid_term_name']),
('Library depleted in', ['replicates.library.depleted_in_term_name']),
('Library extraction method', ['replicates.library.extraction_method']),
('Library lysis method', ['replicates.library.lysis_method']),
('Library crosslinking method', ['replicates.library.crosslinking_method']),
('Library strand specific', ['replicates.library.strand_specificity']),
('Experiment date released', ['date_released']),
('Project', ['award.project']),
(
'RBNS protein concentration', [
'files.replicate.rbns_protein_concentration',
'files.replicate.rbns_protein_concentration_units'
]
),
('Library fragmentation method', ['files.replicate.library.fragmentation_method']),
('Library size range', ['files.replicate.library.size_range']),
('Biological replicate(s)', ['files.biological_replicates']),
('Technical replicate(s)', ['files.technical_replicates']),
('Read length', ['files.read_length']),
('Mapped read length', ['files.mapped_read_length']),
('Run type', ['files.run_type']),
('Paired end', ['files.paired_end']),
('Paired with', ['files.paired_with']),
('Index of', ['files.index_of']),
('Derived from', ['files.derived_from']),
('Size', ['files.file_size']),
('Lab', ['files.lab.title']),
('md5sum', ['files.md5sum']),
('dbxrefs', ['files.dbxrefs']),
('File download URL', ['files.href']),
('Genome annotation', ['files.genome_annotation']),
('Platform', ['files.platform.title']),
('Controlled by', ['files.controlled_by']),
('File Status', ['files.status']),
('No File Available', ['files.no_file_available']),
('Restricted', ['files.restricted']),
('s3_uri', ['files.s3_uri']),
]
)


METADATA_AUDIT_TO_AUDIT_COLUMN_MAPPING = [
('WARNING', 'Audit WARNING'),
('NOT_COMPLIANT', 'Audit NOT_COMPLIANT'),
('ERROR', 'Audit ERROR'),
]


ANNOTATION_METADATA_COLUMN_TO_FIELDS_MAPPING = OrderedDict(
[
('File accession', ['files.title']),
('File format', ['files.file_type']),
('Output type', ['files.output_type']),
('Assay term name', ['files.assay_term_name']),
('Dataset accession', ['accession']),
('Annotation type', ['annotation_type']),
('Software used', ['software_used.software.title']),
('Encyclopedia Version', ['encyclopedia_version']),
('Biosample term id', ['biosample_ontology.term_id']),
('Biosample term name', ['biosample_ontology.term_name']),
('Biosample type', ['biosample_ontology.classification']),
('Life stage', ['relevant_life_stage']),
('Age', ['relevant_timepoint']),
('Age units', ['relevant_timepoint_units']),
('Organism', ['organism.scientific_name']),
('Targets', ['targets.name']),
('Dataset date released', ['date_released']),
('Project', ['award.project']),
('Lab', ['files.lab.title']),
('md5sum', ['files.md5sum']),
('dbxrefs', ['files.dbxrefs']),
('File download URL', ['files.href']),
('Assembly', ['files.assembly']),
('Controlled by', ['files.controlled_by']),
('File Status', ['files.status']),
('Derived from', ['files.derived_from']),
('S3 URL', ['files.cloud_metadata.url']),
('Size', ['files.file_size']),
('No File Available', ['file.no_file_available']),
('Restricted', ['files.restricted'])
]
)


PUBLICATION_DATA_METADATA_COLUMN_TO_FIELDS_MAPPING = OrderedDict(
[
('File accession', ['files.title']),
('File dataset', ['files.dataset']),
('File type', ['files.file_format']),
('File format', ['files.file_type']),
('File output type', ['files.output_type']),
('Assay term name', ['files.assay_term_name']),
('Biosample term id', ['files.biosample_ontology.term_id']),
('Biosample term name', ['files.biosample_ontology.term_name']),
('Biosample type', ['files.biosample_ontology.classification']),
('File target', ['files.target.label']),
('Dataset accession', ['accession']),
('Dataset date released', ['date_released']),
('Project', ['award.project']),
('Lab', ['files.lab.title']),
('md5sum', ['files.md5sum']),
('dbxrefs', ['files.dbxrefs']),
('File download URL', ['files.href']),
('Assembly', ['files.assembly']),
('File status', ['files.status']),
('Derived from', ['files.derived_from']),
('S3 URL', ['files.cloud_metadata.url']),
('Size', ['files.file_size']),
('No File Available', ['files.no_file_available']),
('Restricted', ['files.restricted'])
]
)
Loading