-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENCD-5439-optimize-metadata-endpoint #3462
Merged
Merged
Changes from all commits
Commits
Show all changes
72 commits
Select commit
Hold shift + click to select a range
e339809
Add test for all metadata
keenangraham e4e2e5e
Group audits by file
keenangraham 5440ff4
Add back other audits
keenangraham def0eb0
Remove prints
keenangraham 4b0b846
Fix name
keenangraham 6e04f58
Stream results
keenangraham f4b89de
Split metadata into class
keenangraham 8b3fb82
Use quick return and drop internal_audits
keenangraham 8378f43
Add index_of column
keenangraham 1063769
Move metadata tests to separate file
keenangraham 887dba5
Move metadata mappings to constants
keenangraham c791d23
Block endpoint to allowed types
keenangraham 5f32b68
Update decorator
keenangraham 412b416
Pass in right order
keenangraham babf308
Add routing for annotation and publication data metadata
keenangraham 8c38b6f
Fix name
keenangraham f23a57a
Return default metadata
keenangraham b3838b0
Remove unused
keenangraham a4f4217
Remove unused
keenangraham 90cf178
Update variable
keenangraham c4de425
Remove import
keenangraham 58c45ec
Add test for allowed_types decorator
keenangraham 494aace
Update tests
keenangraham ceeb41c
Fix file matches file param
keenangraham 3124f8c
Make file filter work with nested paths
keenangraham 483e2bb
Strip files. once at start
keenangraham 6155b2a
Replace with nothing
keenangraham 51f946d
Ignore negation in filtering
keenangraham f59cd57
Add comment
keenangraham d2ecd64
Call positive_file_param_list
keenangraham 653caac
Add test
keenangraham 610f3f2
Update test
keenangraham e81412f
Add MetadataReport tests
keenangraham efd8651
Add more tests
keenangraham 022c298
Add more tests
keenangraham 7c266d1
Make validate return more clear
keenangraham 7d2ef4e
Add tests for build_new_request and get_search_results
keenangraham 22a2573
Test should_not_report_file
keenangraham a44b030
Test get_experiment_data
keenangraham 7f962c3
Add tests for get experiment, file, audit data
keenangraham c2fca96
Update metadata
keenangraham fd0c01a
Use index_workbook before testapp
keenangraham 7f2cb09
Try running in nonindexing
keenangraham 23d3895
Add sorted_row test
keenangraham fc74959
Add test for report generator and annotation/publication_data views
keenangraham 603b2b1
Assert status
keenangraham a472f93
Put annotation and publication_data in test harness
keenangraham af2d877
Make columns overridable
keenangraham 0371fbd
Add AnnotationMetadataReport
keenangraham 594ce5f
Remove unused
keenangraham b96cfae
Mark indexing tests
keenangraham 15b787d
Add BatchedSearchGenerator
keenangraham 50dde28
Add make_batches_from_batch_params and test
keenangraham ef428b2
Add batch test
keenangraham 9dcaf3c
Add build request and batched_values and tests
keenangraham 72849b0
Yield results
keenangraham 5bf9844
Add PublicationDataMetadataReport
keenangraham 988d511
Replace takes arg
keenangraham a56bcee
Filter file params from experiment querystring
keenangraham 642c895
Update
keenangraham eb53df1
Remove old endopoint
keenangraham c2c4c85
Use @id in URL instead of dataset
keenangraham 3ec71ec
Update tests
keenangraham b95227d
Add tests
keenangraham e17af3e
Update
keenangraham 4e30ef7
Update and add tests
keenangraham fe075ba
Add test
keenangraham 800d272
Update tests
keenangraham 77b2dae
Put quotes around metadata URL
keenangraham 00c1b60
Update tests with quotes
keenangraham 5f4a85a
Update expected_metadata
keenangraham 3e985e2
Remove unused
keenangraham File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Large diffs are not rendered by default.
Oops, something went wrong.
Empty file.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,166 @@ | ||
from collections import OrderedDict | ||
|
||
|
||
METADATA_ALLOWED_TYPES = [ | ||
'Experiment', | ||
'Annotation', | ||
'FunctionalCharacterizationExperiment', | ||
'PublicationData', | ||
] | ||
|
||
|
||
METADATA_COLUMN_TO_FIELDS_MAPPING = OrderedDict( | ||
[ | ||
('File accession', ['files.title']), | ||
('File format', ['files.file_type']), | ||
('File type', ['files.file_format']), | ||
('File format type', ['files.file_format_type']), | ||
('Output type', ['files.output_type']), | ||
('File assembly', ['files.assembly']), | ||
('Experiment accession', ['accession']), | ||
('Assay', ['assay_title']), | ||
('Biosample term id', ['biosample_ontology.term_id']), | ||
('Biosample term name', ['biosample_ontology.term_name']), | ||
('Biosample type', ['biosample_ontology.classification']), | ||
('Biosample organism', ['replicates.library.biosample.organism.scientific_name']), | ||
('Biosample treatments', ['replicates.library.biosample.treatments.treatment_term_name']), | ||
( | ||
'Biosample treatments amount', | ||
[ | ||
'replicates.library.biosample.treatments.amount', | ||
'replicates.library.biosample.treatments.amount_units' | ||
] | ||
), | ||
( | ||
'Biosample treatments duration', | ||
[ | ||
'replicates.library.biosample.treatments.duration', | ||
'replicates.library.biosample.treatments.duration_units' | ||
] | ||
), | ||
('Biosample genetic modifications methods', ['replicates.library.biosample.applied_modifications.method']), | ||
('Biosample genetic modifications categories', ['replicates.library.biosample.applied_modifications.category']), | ||
('Biosample genetic modifications targets', ['replicates.library.biosample.applied_modifications.modified_site_by_target_id']), | ||
('Biosample genetic modifications gene targets', ['replicates.library.biosample.applied_modifications.modified_site_by_gene_id']), | ||
( | ||
'Biosample genetic modifications site coordinates', [ | ||
'replicates.library.biosample.applied_modifications.modified_site_by_coordinates.assembly', | ||
'replicates.library.biosample.applied_modifications.modified_site_by_coordinates.chromosome', | ||
'replicates.library.biosample.applied_modifications.modified_site_by_coordinates.start', | ||
'replicates.library.biosample.applied_modifications.modified_site_by_coordinates.end' | ||
] | ||
), | ||
('Biosample genetic modifications zygosity', ['replicates.library.biosample.applied_modifications.zygosity']), | ||
('Experiment target', ['target.name']), | ||
('Library made from', ['replicates.library.nucleic_acid_term_name']), | ||
('Library depleted in', ['replicates.library.depleted_in_term_name']), | ||
('Library extraction method', ['replicates.library.extraction_method']), | ||
('Library lysis method', ['replicates.library.lysis_method']), | ||
('Library crosslinking method', ['replicates.library.crosslinking_method']), | ||
('Library strand specific', ['replicates.library.strand_specificity']), | ||
('Experiment date released', ['date_released']), | ||
('Project', ['award.project']), | ||
( | ||
'RBNS protein concentration', [ | ||
'files.replicate.rbns_protein_concentration', | ||
'files.replicate.rbns_protein_concentration_units' | ||
] | ||
), | ||
('Library fragmentation method', ['files.replicate.library.fragmentation_method']), | ||
('Library size range', ['files.replicate.library.size_range']), | ||
('Biological replicate(s)', ['files.biological_replicates']), | ||
('Technical replicate(s)', ['files.technical_replicates']), | ||
('Read length', ['files.read_length']), | ||
('Mapped read length', ['files.mapped_read_length']), | ||
('Run type', ['files.run_type']), | ||
('Paired end', ['files.paired_end']), | ||
('Paired with', ['files.paired_with']), | ||
('Index of', ['files.index_of']), | ||
('Derived from', ['files.derived_from']), | ||
('Size', ['files.file_size']), | ||
('Lab', ['files.lab.title']), | ||
('md5sum', ['files.md5sum']), | ||
('dbxrefs', ['files.dbxrefs']), | ||
('File download URL', ['files.href']), | ||
('Genome annotation', ['files.genome_annotation']), | ||
('Platform', ['files.platform.title']), | ||
('Controlled by', ['files.controlled_by']), | ||
('File Status', ['files.status']), | ||
('No File Available', ['files.no_file_available']), | ||
('Restricted', ['files.restricted']), | ||
('s3_uri', ['files.s3_uri']), | ||
] | ||
) | ||
|
||
|
||
METADATA_AUDIT_TO_AUDIT_COLUMN_MAPPING = [ | ||
('WARNING', 'Audit WARNING'), | ||
('NOT_COMPLIANT', 'Audit NOT_COMPLIANT'), | ||
('ERROR', 'Audit ERROR'), | ||
] | ||
|
||
|
||
ANNOTATION_METADATA_COLUMN_TO_FIELDS_MAPPING = OrderedDict( | ||
[ | ||
('File accession', ['files.title']), | ||
('File format', ['files.file_type']), | ||
('Output type', ['files.output_type']), | ||
('Assay term name', ['files.assay_term_name']), | ||
('Dataset accession', ['accession']), | ||
('Annotation type', ['annotation_type']), | ||
('Software used', ['software_used.software.title']), | ||
('Encyclopedia Version', ['encyclopedia_version']), | ||
('Biosample term id', ['biosample_ontology.term_id']), | ||
('Biosample term name', ['biosample_ontology.term_name']), | ||
('Biosample type', ['biosample_ontology.classification']), | ||
('Life stage', ['relevant_life_stage']), | ||
('Age', ['relevant_timepoint']), | ||
('Age units', ['relevant_timepoint_units']), | ||
('Organism', ['organism.scientific_name']), | ||
('Targets', ['targets.name']), | ||
('Dataset date released', ['date_released']), | ||
('Project', ['award.project']), | ||
('Lab', ['files.lab.title']), | ||
('md5sum', ['files.md5sum']), | ||
('dbxrefs', ['files.dbxrefs']), | ||
('File download URL', ['files.href']), | ||
('Assembly', ['files.assembly']), | ||
('Controlled by', ['files.controlled_by']), | ||
('File Status', ['files.status']), | ||
('Derived from', ['files.derived_from']), | ||
('S3 URL', ['files.cloud_metadata.url']), | ||
('Size', ['files.file_size']), | ||
('No File Available', ['file.no_file_available']), | ||
('Restricted', ['files.restricted']) | ||
] | ||
) | ||
|
||
|
||
PUBLICATION_DATA_METADATA_COLUMN_TO_FIELDS_MAPPING = OrderedDict( | ||
[ | ||
('File accession', ['files.title']), | ||
('File dataset', ['files.dataset']), | ||
('File type', ['files.file_format']), | ||
('File format', ['files.file_type']), | ||
('File output type', ['files.output_type']), | ||
('Assay term name', ['files.assay_term_name']), | ||
('Biosample term id', ['files.biosample_ontology.term_id']), | ||
('Biosample term name', ['files.biosample_ontology.term_name']), | ||
('Biosample type', ['files.biosample_ontology.classification']), | ||
('File target', ['files.target.label']), | ||
('Dataset accession', ['accession']), | ||
('Dataset date released', ['date_released']), | ||
('Project', ['award.project']), | ||
('Lab', ['files.lab.title']), | ||
('md5sum', ['files.md5sum']), | ||
('dbxrefs', ['files.dbxrefs']), | ||
('File download URL', ['files.href']), | ||
('Assembly', ['files.assembly']), | ||
('File status', ['files.status']), | ||
('Derived from', ['files.derived_from']), | ||
('S3 URL', ['files.cloud_metadata.url']), | ||
('Size', ['files.file_size']), | ||
('No File Available', ['files.no_file_available']), | ||
('Restricted', ['files.restricted']) | ||
] | ||
) |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is new and speed is the goal, is OrderedDict needed in this file and other files? Beginning in in Python 3.6, dict has ordering guaranteed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point. I'm going to leave it explicit for now, but should test for correctness and speed after removing these.