Ran pre-commit run --all-files by hand.

oist · May 21, 2024 · dbe4eb8 · dbe4eb8
1 parent ca8f726
commit dbe4eb8
Show file tree

Hide file tree

Showing 4 changed files with 107 additions and 155 deletions.
diff --git a/README.md b/README.md
@@ -19,124 +19,120 @@
 
 ## Introduction
 
-**nf-core/pairgenomealign** is a bioinformatics pipeline that aligns a single or set of query genomes in csv format with a target genome to make a pairwise representation in dotplots. 
+**nf-core/pairgenomealign** is a bioinformatics pipeline that aligns a single or set of query genomes in csv format with a target genome to make a pairwise representation in dotplots.
 
-This pipeline usually takes in as an input a sample sheet in csv format which contain this set of queries or single query and align it pairwise with atarget genome in fasta or fa.gz format to make a dotplots representation of the paired alignment or alignments in case of multiple queries. 
+This pipeline usually takes in as an input a sample sheet in csv format which contain this set of queries or single query and align it pairwise with atarget genome in fasta or fa.gz format to make a dotplots representation of the paired alignment or alignments in case of multiple queries.
 
 <img src= "assets/tube_map.svg">
 
 ## Outputs
 
-For each _query_ genome, this pipeline will align it to the _target_genome, post-process the alignments and produce dot plots visualisations at different steps of the workflow.  Each file contains a name suffix that indicates in which order they were created.
+For each _query_ genome, this pipeline will align it to the _target_ genome, post-process the alignments and produce dot plots visualisations at different steps of the workflow. Each file contains a name suffix that indicates in which order they were created.
 
- - `.train` is the alignment parameters computed by `last-train` (optional)
- - `m2m_aln` is the _**many-to-many**_ alignment between _target_ and _query_ genomes. (optional through the `--m2m` option)
- - `m2m_plot` (optional)
- - `m2o_aln` is the _**many-to-one**_ alignment regions of the _target_ genome are matched at most once by the _query_ genome.
- - `m2o_plot` (optional)
- - `o2o_aln` is the _**one-to-one**_ alignment between the _target_ and _query_ genomes.
- - `o2o_plot` (optional)
- - `o2m_aln` is the _**one-to-many**_ alignment between the _target_ and _query_ genomes (optional).
- - `o2m_plot` (optional)
+- `.train` is the alignment parameters computed by `last-train` (optional)
+- `m2m_aln` is the _**many-to-many**_ alignment between _target_ and _query_ genomes. (optional through the `--m2m` option)
+- `m2m_plot` (optional)
+- `m2o_aln` is the _**many-to-one**_ alignment regions of the _target_ genome are matched at most once by the _query_ genome.
+- `m2o_plot` (optional)
+- `o2o_aln` is the _**one-to-one**_ alignment between the _target_ and _query_ genomes.
+- `o2o_plot` (optional)
+- `o2m_aln` is the _**one-to-many**_ alignment between the _target_ and _query_ genomes (optional).
+- `o2m_plot` (optional)
 
 ## Mandatory parameters
 
- * `--target`: path or URL to one genome file in FASTA format.  It will be indexed.
+- `--target`: path or URL to one genome file in FASTA format. It will be indexed.
 
- * `--input`: path to a sample sheet in comma-separated format with one header line`sample, fasta`, and one row per genome (ID and path or URL to FASTA file).
-
-   — or —
-
-   `--query`: path or URL to one genome file in FASTA format.
+- `--input`: path to a sample sheet in comma-separated format with one header line`sample, fasta`, and one row per genome (ID and path or URL to FASTA file).
 
+  — or —
 
+  `--query`: path or URL to one genome file in FASTA format.
 
 ## Options
 
- * `--seed` selects the name of the [LAST seed][]  The default (`YASS`) searches for “_long-and-weak similarities_” that “_allow for mismatches but not gaps_”.  Among alternatives, there are `NEAR` for “_short-and-strong (near-identical) similarities_ … _with many gaps (insertions and deletions)_”, `MAM8` to find _“weak
-   similarities with high sensitivity, but low speed and high memory usage”_
-   or `RY128` that “_reduces run time and memory use, by only seeking seeds at
-   ~1/128 of positions in each sequence_”, which is useful when the purpose of
-   running this pipeline is only to generate whole-genome dotplots, or when
-   sensitivity for tiny fragments may be unnecessary or undesirable.  Setting
-   the seed to `PSEUDO` triggers protein-to-DNA alignment mode (experimental). 
-
- * `--lastal_args` defaults to `-C2` and is applied to both
-   the calls to `last-train` and `lastal`, like in the [LAST cookbook][]
-   and the [last-genome-alignments][] tutorial.
-
- * `--lastal_extr_args` (default: `-D1e9`) is only passed to `lastal` and
-   can be used for arguments that are not recognised by `last-train`.
-
- * `--lastal_params`: path to a file containing alignment parameters
-   computed by [`last-train`][] or a [scoring matrix][].  If this option
-   is not used, the pipeline will run `last-train` for each query.
-
- * `--m2m`: (default: false) Compute and output the many-to-many alignment.
-   This adds time and can comsume considerable amount of space; use only
-   if you need that data.
-
- * `--o2m`: (default: false) Also compute the _**one-to-many**_ alignments
-   and dotplots.  This is sometimes useful when troubleshooting the
-   preparation of diploid assemblies.
-
- * `--one_to_one_only`: do not copy the other alignments to the results
-   folder, thus saving disk space.
-
- * By default, `last-split` runs with `-m1e-5` to omit alignments with
-   mismap probability > 10<sup>−5</sup>, but this can be overriden with
-   the `--last_split_mismap` option.
-
- * `--last_split_args` defaults to empty value and is not very useful at the
-   moment, but is kept for backwards compatibility.  It can be used to pass
-   options to `last-split`.  Note that if you used `--m2m false` (which is
-   the default), the split parameters have to be passed in
-   `--lastal_extra_args` and have different names (see _split options_ in the
-   [lastal documentation][]).
-
- * The dotplots can be modified by overriding defaults and passing new
-   arguments via the `--dotplot_options` argument.  Defaults and available
-   options can be seen on the manual page of the [`last-dotplot`][] program.
-   By default in this pipeline, the sequences of the _query_ genome are
-   sorted and oriented by their alignment to the _target_ genome
-   (`--sort2=3 --strands2=1`). For readability, their names are written
-   horizontally (`--rot2=h`).
-
- * Use `--skip_dotplot_m2m`, `--skip_dotplot_m2o`, `--skip_dotplot_o2o`
-   `--skip_dotplot_o2m` to skip the production of the dot plots that can be
-   computationally expensive and visually uninformative on large genomes with
-   shared repeats.  File suffixes (see above) will not change.
-
- * By default the LAST index is named `target` and the ouput files are named
-   from the query IDs.  Use the `--targetName` option to provide a name
-   that will be used for the LAST index and that will be prefixed to the
-   query IDs with a `___` separator.
-
-
-  [`lastal`]:       https://gitlab.com/mcfrith/last/-/blob/main/doc/lastal.rst
-  [`last-dotplot`]: https://gitlab.com/mcfrith/last/-/blob/main/doc/last-dotplot.rst
-  [LAST seed]:      https://gitlab.com/mcfrith/last/-/blob/main/doc/last-seeds.rst
-  [LAST cookbook]:  https://gitlab.com/mcfrith/last/-/blob/main/doc/last-cookbook.rst
-  [`last-train`]:   https://gitlab.com/mcfrith/last/-/blob/main/doc/last-train.rst
-  [LAST tuning]:    https://gitlab.com/mcfrith/last/-/blob/main/doc/last-tuning.rst
-  [scoring matrix]: https://gitlab.com/mcfrith/last/-/blob/main/doc/last-matrices.rst
-  [lastal documentation]: https://gitlab.com/mcfrith/last/-/blob/main/doc/lastal.rst
-  [last-genome-alignments]: https://github.com/mcfrith/last-genome-alignments
+- `--seed` selects the name of the [LAST seed][] The default (`YASS`) searches for “_long-and-weak similarities_” that “_allow for mismatches but not gaps_”. Among alternatives, there are `NEAR` for “_short-and-strong (near-identical) similarities_ … _with many gaps (insertions and deletions)_”, `MAM8` to find _“weak
+  similarities with high sensitivity, but low speed and high memory usage”_
+  or `RY128` that “_reduces run time and memory use, by only seeking seeds at
+  ~1/128 of positions in each sequence_”, which is useful when the purpose of
+  running this pipeline is only to generate whole-genome dotplots, or when
+  sensitivity for tiny fragments may be unnecessary or undesirable. Setting
+  the seed to `PSEUDO` triggers protein-to-DNA alignment mode (experimental).
+
+- `--lastal_args` defaults to `-C2` and is applied to both
+  the calls to `last-train` and `lastal`, like in the [LAST cookbook][]
+  and the [last-genome-alignments][] tutorial.
+
+- `--lastal_extr_args` (default: `-D1e9`) is only passed to `lastal` and
+  can be used for arguments that are not recognised by `last-train`.
+
+- `--lastal_params`: path to a file containing alignment parameters
+  computed by [`last-train`][] or a [scoring matrix][]. If this option
+  is not used, the pipeline will run `last-train` for each query.
+
+- `--m2m`: (default: false) Compute and output the many-to-many alignment.
+  This adds time and can comsume considerable amount of space; use only
+  if you need that data.
+
+- `--o2m`: (default: false) Also compute the _**one-to-many**_ alignments
+  and dotplots. This is sometimes useful when troubleshooting the
+  preparation of diploid assemblies.
+
+- `--one_to_one_only`: do not copy the other alignments to the results
+  folder, thus saving disk space.
+
+- By default, `last-split` runs with `-m1e-5` to omit alignments with
+  mismap probability > 10<sup>−5</sup>, but this can be overriden with
+  the `--last_split_mismap` option.
+
+- `--last_split_args` defaults to empty value and is not very useful at the
+  moment, but is kept for backwards compatibility. It can be used to pass
+  options to `last-split`. Note that if you used `--m2m false` (which is
+  the default), the split parameters have to be passed in
+  `--lastal_extra_args` and have different names (see _split options_ in the
+  [lastal documentation][]).
+
+- The dotplots can be modified by overriding defaults and passing new
+  arguments via the `--dotplot_options` argument. Defaults and available
+  options can be seen on the manual page of the [`last-dotplot`][] program.
+  By default in this pipeline, the sequences of the _query_ genome are
+  sorted and oriented by their alignment to the _target_ genome
+  (`--sort2=3 --strands2=1`). For readability, their names are written
+  horizontally (`--rot2=h`).
+
+- Use `--skip_dotplot_m2m`, `--skip_dotplot_m2o`, `--skip_dotplot_o2o`
+  `--skip_dotplot_o2m` to skip the production of the dot plots that can be
+  computationally expensive and visually uninformative on large genomes with
+  shared repeats. File suffixes (see above) will not change.
+
+- By default the LAST index is named `target` and the ouput files are named
+  from the query IDs. Use the `--targetName` option to provide a name
+  that will be used for the LAST index and that will be prefixed to the
+  query IDs with a `___` separator.
+
+[`lastal`]: https://gitlab.com/mcfrith/last/-/blob/main/doc/lastal.rst
+[`last-dotplot`]: https://gitlab.com/mcfrith/last/-/blob/main/doc/last-dotplot.rst
+[LAST seed]: https://gitlab.com/mcfrith/last/-/blob/main/doc/last-seeds.rst
+[LAST cookbook]: https://gitlab.com/mcfrith/last/-/blob/main/doc/last-cookbook.rst
+[`last-train`]: https://gitlab.com/mcfrith/last/-/blob/main/doc/last-train.rst
+[LAST tuning]: https://gitlab.com/mcfrith/last/-/blob/main/doc/last-tuning.rst
+[scoring matrix]: https://gitlab.com/mcfrith/last/-/blob/main/doc/last-matrices.rst
+[lastal documentation]: https://gitlab.com/mcfrith/last/-/blob/main/doc/lastal.rst
+[last-genome-alignments]: https://github.com/mcfrith/last-genome-alignments
 
 ## Fixed arguments (taken from the [LAST cookbook][] and the [LAST tuning][] manual)
 
- * The `lastdb` step soft-masks simple repeats by default, (`-c -R01`).It indexes both strands (`-S2`), which increases speed at the expense of memory usage.
+- The `lastdb` step soft-masks simple repeats by default, (`-c -R01`).It indexes both strands (`-S2`), which increases speed at the expense of memory usage.
 
- * The `last-train` commands runs with `--revsym` as the DNA strands play equivalent roles in the studied genomes, unless the `--read_align` option is selected.
+- The `last-train` commands runs with `--revsym` as the DNA strands play equivalent roles in the studied genomes, unless the `--read_align` option is selected.
 
- * `last-split` runs with `-fMAF+` to make it show per-base mismap probabilities, except in read alignment mode (see below).
+- `last-split` runs with `-fMAF+` to make it show per-base mismap probabilities, except in read alignment mode (see below).
 
 ## Usage
 
 > [!NOTE]
 > If you are new to Nextflow and nf-core, please refer to [this page](https://nf-co.re/docs/usage/installation) on how to set-up Nextflow. Make sure to [test your setup](https://nf-co.re/docs/usage/introduction#how-to-run-a-pipeline) with `-profile test` before running the workflow on actual data.
 
-
 First, prepare a samplesheet with your input data that looks as follows:
 
 `samplesheet.csv`:
@@ -145,12 +141,11 @@ First, prepare a samplesheet with your input data that looks as follows:
 sample,fasta
 Query_1,AEG588A1_S1_L002_R1_001.fasta
 ```
-Each row represents a fasta file, this can also contain multiple rows to accomodate multiple query genomes in fasta format.
 
+Each row represents a fasta file, this can also contain multiple rows to accomodate multiple query genomes in fasta format.
 
 Now, you can run the pipeline using:
 
-
 ```bash
 nextflow run nf-core/pairgenomealign \
    -profile <docker/singularity/.../institute> \
@@ -188,7 +183,7 @@ For further information or help, don't hesitate to get in touch on the [Slack `#
 
 If you use this pipeline, please cite:
 
-Extreme genome scrambling in marine planktonic Oikopleura dioica cryptic species_. Charles Plessy, Michael J. Mansfield, Aleksandra Bliznina, Aki Masunaga, Charlotte West, Yongkai Tan, Andrew W. Liu, Jan Grašič, María Sara del Río Pisula, Gaspar Sánchez-Serna, Marc Fabrega-Torrus, Alfonso Ferrández-Roldán, Vittoria Roncalli, Pavla Navratilova, Eric M. Thompson, Takeshi Onuma, Hiroki Nishida, Cristian Cañestro, Nicholas M. Luscombe.  Genome Res. 2024. 34: 426-440; doi:[10.1101/2023.05.09.539028](https://doi.org/10.1101/gr.278295.123). PubMed ID: [38621828](https://pubmed.ncbi.nlm.nih.gov/38621828/)
+Extreme genome scrambling in marine planktonic Oikopleura dioica cryptic species. Charles Plessy, Michael J. Mansfield, Aleksandra Bliznina, Aki Masunaga, Charlotte West, Yongkai Tan, Andrew W. Liu, Jan Grašič, María Sara del Río Pisula, Gaspar Sánchez-Serna, Marc Fabrega-Torrus, Alfonso Ferrández-Roldán, Vittoria Roncalli, Pavla Navratilova, Eric M. Thompson, Takeshi Onuma, Hiroki Nishida, Cristian Cañestro, Nicholas M. Luscombe. Genome Res. 2024. 34: 426-440; doi:[10.1101/2023.05.09.539028](https://doi.org/10.1101/gr.278295.123). PubMed ID: [38621828](https://pubmed.ncbi.nlm.nih.gov/38621828/)
 
 [OIST research news article](https://www.oist.jp/news-center/news/2024/4/25/oikopleura-who-species-identity-crisis-genome-community)
 

diff --git a/docs/output.md b/docs/output.md
@@ -6,15 +6,13 @@ This document describes the output produced by the pipeline. Most of the plots a
 
 The directories listed below will be created in the results directory after the pipeline has finished. All paths are relative to the top-level results directory.
 
-
 ## Pipeline overview
 
 The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes data using the following steps:
 
 - [MultiQC](#multiqc) - Aggregate report describing results and QC from the whole pipeline
 - [Pipeline information](#pipeline-information) - Report metrics generated during the workflow execution
 
-
 ### MultiQC
 
 <details markdown="1">

diff --git a/modules.json b/modules.json
@@ -8,65 +8,47 @@
                     "assemblyscan": {
                         "branch": "master",
                         "git_sha": "3f5420aa22e00bd030a2556dfdffc9e164ec0ec5",
-                        "installed_by": [
-                            "modules"
-                        ]
+                        "installed_by": ["modules"]
                     },
                     "gfastats": {
                         "branch": "master",
                         "git_sha": "3f5420aa22e00bd030a2556dfdffc9e164ec0ec5",
-                        "installed_by": [
-                            "modules"
-                        ]
+                        "installed_by": ["modules"]
                     },
                     "last/dotplot": {
                         "branch": "master",
                         "git_sha": "3f5420aa22e00bd030a2556dfdffc9e164ec0ec5",
-                        "installed_by": [
-                            "modules"
-                        ]
+                        "installed_by": ["modules"]
                     },
                     "last/lastal": {
                         "branch": "master",
                         "git_sha": "3f5420aa22e00bd030a2556dfdffc9e164ec0ec5",
-                        "installed_by": [
-                            "modules"
-                        ]
+                        "installed_by": ["modules"]
                     },
                     "last/lastdb": {
                         "branch": "master",
                         "git_sha": "3f5420aa22e00bd030a2556dfdffc9e164ec0ec5",
-                        "installed_by": [
-                            "modules"
-                        ]
+                        "installed_by": ["modules"]
                     },
                     "last/mafswap": {
                         "branch": "master",
                         "git_sha": "3f5420aa22e00bd030a2556dfdffc9e164ec0ec5",
-                        "installed_by": [
-                            "modules"
-                        ]
+                        "installed_by": ["modules"]
                     },
                     "last/split": {
                         "branch": "master",
                         "git_sha": "3f5420aa22e00bd030a2556dfdffc9e164ec0ec5",
-                        "installed_by": [
-                            "modules"
-                        ]
+                        "installed_by": ["modules"]
                     },
                     "last/train": {
                         "branch": "master",
                         "git_sha": "3f5420aa22e00bd030a2556dfdffc9e164ec0ec5",
-                        "installed_by": [
-                            "modules"
-                        ]
+                        "installed_by": ["modules"]
                     },
                     "multiqc": {
                         "branch": "master",
                         "git_sha": "b7ebe95761cd389603f9cc0e0dc384c0f663815a",
-                        "installed_by": [
-                            "modules"
-                        ]
+                        "installed_by": ["modules"]
                     }
                 }
             },
@@ -75,26 +57,20 @@
                     "utils_nextflow_pipeline": {
                         "branch": "master",
                         "git_sha": "5caf7640a9ef1d18d765d55339be751bb0969dfa",
-                        "installed_by": [
-                            "subworkflows"
-                        ]
+                        "installed_by": ["subworkflows"]
                     },
                     "utils_nfcore_pipeline": {
                         "branch": "master",
                         "git_sha": "92de218a329bfc9a9033116eb5f65fd270e72ba3",
-                        "installed_by": [
-                            "subworkflows"
-                        ]
+                        "installed_by": ["subworkflows"]
                     },
                     "utils_nfvalidation_plugin": {
                         "branch": "master",
                         "git_sha": "5caf7640a9ef1d18d765d55339be751bb0969dfa",
-                        "installed_by": [
-                            "subworkflows"
-                        ]
+                        "installed_by": ["subworkflows"]
                     }
                 }
             }
         }
     }
-}
+}