Skip to content

haiyueliu/SLAM-Drop-seq

 
 

Repository files navigation

Processing SLAM-Drop-seq data for the estimation of gene-specific time-depdent RNA kinetic rate.

Getting Started

DropseqTools is needed. See Dropseq Tools Drop-seq GitHub Pages.

You also need to downlaod our scripts used in the pipeline to your local machine.

Processing data

1. Generate annotations from gtf file

Generate annotations used for DropseqTools, convert gtf file annotation to non-overlapping exon and intron bed files.

sh prepare_annotation_files.sh 

2. Modify the config yaml file

## change the directories and parameters in the config.yaml file to your own settings
## move to the working folder, load the config.yaml, samples.tsv and snake files there.
## create subfoler 'output': the path for all the ourput files generated by this pipeline

3. Process data

The data processing pipeline includes the main Drop-seq pipeline, annotating the unique feature(gene) the reads mapped to, annotating the splice status, annotating mismatches, merging reads from the same UMI, and identifing true mismatches from sequencing errors.

snakemake --cores N -s Snakefile_data_processing

4. Identify background conversions and calculate 4sU incorporation rate

This R script only needs to be run once using the no-4sU and 24h-4sU samples. Once you have identified the background conversion rate and 4sU incorporation rate in your experiment, you can omit this step.

Background conversions rate

The background conversions are identified from control samples (no 4sU). Positions with high confident conversions are removed from all samples when quantify the mismatches for each molecule.

4sU incorporation rate

The 4sU incorporation rate is estimated from the long time 4sU labeled samples (4sU 24 hours) where all unspliced molecules are assumed to be newly synthesized from the labeling experiment on-set.

/path_to_Rscript/Rscript background_conversions_incoporation_rate.R

5. Generate gene expression count matrices

The newly synthesized molecules and the pre-exsiting ones are identified by ultilzing Bayesian inference.

Lastly, gene expression count matrices of four types of RNA molecules (labeled mature, unlabeled mature, labeled precursor and unlabeled precursor) were obtained.

snakemake --cores N -s Snakefile_quantification

Downstream analysis

1. Sort unsychronized cells in cell cycle time using Revelio (Schwabe et al., 2020).

2. Estimate time-dependent RNA transcription and degradation rates using Eskrate.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 42.1%
  • R 29.5%
  • Shell 28.4%