Overview

The HuBMAP Consortium sc-atac-seq pipeline is a pipeline for analyzing scATAC-seq data sets, composed of ArchR, and chromVAR. Source code can be found at https://github.com/hubmapconsortium/sc-atac-seq-pipeline

The pipeline performs quantification using a specified aligner, and HuBMAP has standardized on BWA with the GRCh38 reference genome. ArchR divides the genome into non-overlapping bins of user-specified size (we use 500), produces FASTQC analysis of the input fastq files, and produces a binary cell-by-bin matrix denoting whether reads in each cell were aligned to each bin.

The ArchR secondary analysis pipeline filters bins based on TSS enrichment and fragment number, performs LSI dimensionality reduction, and selects peaks from all available bins. The chromVAR tool performs motif analysis, assigns motifs to transcription factors, and computes differential enrichment of transcription factors across cells in the data set.

Requirements

Running the pipeline requires a CWL workflow execution engine, and we recommend the cwltool reference implementation, which is written in Python. This can be installed in a sufficiently recent Python environment with pip install cwltool, after which the pipeline can be invoked as:

cwltool sc_atac_seq_prep_process_analyze.cwl sc_atac_seq_prep_process_analyze.json

To build the Docker images run

build_docker_containers

from the sc-atac-seq directory. The build could take up to an hour.

Supplementary Data

The HuBMAP sc-atac-seq pipeline uses the Genome Reference Consortium human genome, build 38 (GRCh38). A BWA generated set of index files is required for the reference genome. Using an alternate reference or index is not currently supported without rebuilding the sc-atac-seq Docker container, though one can build an alternate container by modifying the Dockerfile.

Inputs

Required

sequence_directory
A directory for the pipeline to search for fastq or fastq.gz files. The pipeline only works on paired end reads and expects, for historical reasons, the paired end read files to be named <some_name>*_R1*.fastq and <some_name>*_R3*.fastq. If a file containing barcodes <some_name>*_R2*.fastq is found the barcodes will be read and added to the read IDs in the paired end fastq files
input_reference_genome
A fasta file of the GRCh38 reference genome

Optional

reference_genome_index
A .gz file containing the BWA generated index of the GRCh38 reference genome. I.e. the ".bwt", ".sa", ".ann", ".pac", ".amb" files generated by BWA indexing. If this file is provided the index will not have to be generated by the pipeline saving some time.

Outputs

Bins.csv
A CSV file providing sequence name and bin information
cellBarcodes.CSV
A CSV file with barcode ID and barcode
cellByBin_summary.csv
A CSV file with barcode ID and bin number
cellClusterAssignment.csv
A CSV file with barcode ID and cluster number
GenesRanges.csv
A CSV file providing sequence, gene name and gene location information
cellByGene.mtx
A file with the cell by gene matrix in Matrix Market format
cellGenes.csv
A CSV file with gene ID and gene name
peaksAllCells.csv
A CSV file with sequence name and peak start and end

Name		Name	Last commit message	Last commit date
Latest commit History 486 Commits
bin		bin
data		data
docker		docker
steps		steps
.dockstore.yml		.dockstore.yml
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.travis.yml		.travis.yml
LICENSE		LICENSE
README.md		README.md
bulk-atac-seq-pipeline-manifest.json		bulk-atac-seq-pipeline-manifest.json
bulk-atac-seq-pipeline.cwl		bulk-atac-seq-pipeline.cwl
bulk_gather_sequence_bundles.cwl		bulk_gather_sequence_bundles.cwl
docker_images.txt		docker_images.txt
install_R_packages.R		install_R_packages.R
pipeline_release_mgmt.yaml		pipeline_release_mgmt.yaml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
requirements_host.txt		requirements_host.txt
sc_atac_seq_prep_process_analyze-manifest.json		sc_atac_seq_prep_process_analyze-manifest.json
sc_atac_seq_prep_process_analyze.cwl		sc_atac_seq_prep_process_analyze.cwl
sc_atac_seq_prep_process_analyze.json		sc_atac_seq_prep_process_analyze.json
test.sh		test.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Overview

Requirements

Supplementary Data

Inputs

Required

Optional

Outputs

About

Releases

Packages

Contributors 3

Languages

License

hubmapconsortium/sc-atac-seq-pipeline

Folders and files

Latest commit

History

Repository files navigation

Overview

Requirements

Supplementary Data

Inputs

Required

Optional

Outputs

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages