Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add sintax #577

Merged
merged 43 commits into from
May 15, 2023
Merged

Add sintax #577

merged 43 commits into from
May 15, 2023

Conversation

jtangrot
Copy link
Contributor

@jtangrot jtangrot commented May 3, 2023

This PR adds the SINTAX algorithm as an alternative for taxonomy assignment, and addresses issue #471.
If --sintax_ref_taxonomy is given, SINTAX is used for taxonomical classification in addition to DADA2. The SINTAX taxonomy is used instead of any other classification in downstream steps (qiime, sbdiexport). I also added the option --skip_dada_taxonomy, to skip running DADA2 to get taxonomies.

PR checklist

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
  • If you've added a new tool - have you followed the pipeline conventions in the contribution docs- [ ] If necessary, also make a PR on the nf-core/ampliseq branch on the nf-core/test-datasets repository.
  • Make sure your code lints (nf-core lint).
  • Ensure the test suite passes (nextflow run . -profile test,docker --outdir <OUTDIR>).
  • Usage Documentation in docs/usage.md is updated.
  • Output Documentation in docs/output.md is updated.
  • CHANGELOG.md is updated.
  • README.md is updated (including new tool citations and authors/contributors).

@github-actions
Copy link

github-actions bot commented May 3, 2023

nf-core lint overall result: Passed ✅ ⚠️

Posted for pipeline commit e103718

+| ✅ 157 tests passed       |+
#| ❔   1 tests were ignored |#
!| ❗   2 tests had warnings |!

❗ Test warnings:

  • readme - README did not have a Nextflow minimum version badge.
  • schema_lint - Parameter input is not defined in the correct subschema (input_output_options)

❔ Tests ignored:

✅ Tests passed:

Run details

  • nf-core/tools version 2.8
  • Run at 2023-05-11 19:23:16

Copy link
Collaborator

@d4straub d4straub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me (except that I feel not confident with my review of ampliseq.nf because it seems to convoluted now).
Docs need a bit of polishing before it can be merged, imho.

docs/output.md Outdated Show resolved Hide resolved
docs/output.md Show resolved Hide resolved
docs/output.md Show resolved Hide resolved
modules/local/format_taxonomy_sintax.nf Outdated Show resolved Hide resolved
workflows/ampliseq.nf Outdated Show resolved Hide resolved
workflows/ampliseq.nf Show resolved Hide resolved
@d4straub
Copy link
Collaborator

d4straub commented May 4, 2023

Because I solved the merge conflict: please pull your branch before you continue working!

@d4straub
Copy link
Collaborator

d4straub commented May 5, 2023

I broke the test_sintax, sorry. The solution should be (predicted above) to change 'biocontainers/biocontainers:v1.2.0_cv1' to 'docker.io/biocontainers/biocontainers:v1.2.0_cv1' in format_taxonomy_sintax.nf. Because I broke it I should also fix it I think. I hope I am not too intrusive here.

@d4straub
Copy link
Collaborator

d4straub commented May 8, 2023

I think test_pplace fails because ampliseq.nf contains still FASTA_NEWICK_EPANG_GAPPA.out.grafted_phylogeny which should be replaced by PPLACE_TAXONOMY_WF (but currently doesnt export the grafted_phylogeny).

Copy link
Collaborator

@d4straub d4straub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that looks great!
Moving dada2 & sintax taxonomy classification to own subworkflows simplifies ampliseq.nf significantly, I think. But my recommendation to put phylogenetic placement taxonomical assignment into its own subworkflow was a very bad idea, sorry, because that is already a subworkflow.

subworkflows/local/dada2_taxonomy_wf.nf Outdated Show resolved Hide resolved
subworkflows/local/pplace_taxonomy_wf.nf Outdated Show resolved Hide resolved
Copy link
Collaborator

@d4straub d4straub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great simplification of the DADA2 taxonomic workflow, thanks! Thats so much easier to read now I think.
I tested a little (not everything) and all I saw was fine, except two tiny points:

  • the sintax taxonomy table had NA while other (DADA2, QIIME2) are blank when no info is available
  • Processes SINTAX_TAXONOMY_WF:* and QIIME2_TAXONOMY:QIIME2_CLASSIFY are shown even when not executed, while DADA2_TAXONOMY_WF is not shown when not executed

The first point might be good to address, the second point doesnt matter I think.

On an unrelated note, I recognized that NFCORE_AMPLISEQ:AMPLISEQ:QIIME2_DIVERSITY:QIIME2_TREE is not listed and therefore no QIIME2_DIVERSITY:* is executing with -profile test. That still works with 2.5.0 but seems already broken in dev, so seems unrelated to your PR. I might have broken that in the phylogenetic palcement PR, because I fiddled with the tree info. Will need to investigate at another time.

@jtangrot
Copy link
Contributor Author

Great simplification of the DADA2 taxonomic workflow, thanks! Thats so much easier to read now I think.

Thanks for the pointers!

I tested a little (not everything) and all I saw was fine, except two tiny points:

* the sintax taxonomy table had NA while other (DADA2, QIIME2) are blank when no info is available

True, fixed it.

* Processes SINTAX_TAXONOMY_WF:* and QIIME2_TAXONOMY:QIIME2_CLASSIFY are shown even when not executed, while DADA2_TAXONOMY_WF is not shown when not executed

This I think is because there is a check if DADA2_TAXONOMY_WF should run, while there is no test for e.g. SINTAX_TAXONOMY_WF:*, instead they are fed an empty input channel and therefore not run. I added a test for sintax, but left the qiime things as they are. There are a lot of qiime modules that are run or not depending on their input channels.

@d4straub d4straub mentioned this pull request May 10, 2023
9 tasks
@jtangrot jtangrot merged commit 1e1d001 into nf-core:dev May 15, 2023
@jtangrot
Copy link
Contributor Author

Thanks!

@jtangrot jtangrot deleted the add_sintax branch May 15, 2023 05:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants