Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Different number of SNPs from same data when ParSNP run on different computers #99

Open
SarahNadeau opened this issue Dec 14, 2021 · 6 comments

Comments

@SarahNadeau
Copy link

We are having a problem that ParSNP returns different numbers of SNPs when run on the same input but on different computers.

Here is the script we're running - it pulls genomes from NCBI and runs ParSNP. When run by my colleage on his Linux computer, ParSNP yields a 425-character SNP alignment. When run in a Ubuntu-based Docker container on my computer, ParSNP yields a 450-character SNP alignment.

We are both using ParSNP version 1.5.6. When we run the script multiple times on the same computer, we do get consistent results. So, the differences seem to be introduced by running on different computers.

Do you know what might be causing these differences?

@bkille
Copy link
Contributor

bkille commented Dec 15, 2021

Hmmm. Thanks for pointing this out. I'm not immediately aware of what could be causing this. I'm assuming both versions came from Conda?

@SarahNadeau
Copy link
Author

Thanks for the speedy reply! Actually, both versions were built from source. I noticed the same thing when I use the same Dockerfile to build from source as part of a GitHub actions workflow vs on my computer as well.

@SarahNadeau
Copy link
Author

SarahNadeau commented Dec 15, 2021

In case it helps, here's an example of the different SNP alignments from ParSNP built using the same Dockerfile on my laptop vs in the GitHub actions workflow and the ParSNP alignment logs from each. The Dockerfile is here.

SNP alignments
Aligner logs

@bkille
Copy link
Contributor

bkille commented Dec 16, 2021

Hmm that is strange. The link you posted earlier to the script you ran no longer seems to work. If you can pass that to me, I can run Parsnp from the conda version and see which of your outputs it agrees with.

@SarahNadeau
Copy link
Author

Sorry about that, I renamed the script in the meantime. Here's the new link and I've also attached it as a zip archive.
https://github.com/SarahNadeau/bioinf-containers/blob/master/parsnp/tests/scripts/run_positive_control.sh
run_positive_control.sh.zip

@SarahNadeau
Copy link
Author

SarahNadeau commented Feb 22, 2022

Hi @bkille I just wanted to follow up that I am having the same issue with ParSNP when installed via conda in a Docker image. I've tried to make the problem easier for you to reproduce via Docker.

Dockerfile:

FROM ubuntu:xenial as app

ARG PARSNP_VER=1.5.6

# Update package index, install packages
RUN apt-get update && apt-get install -y wget

# Install miniconda
RUN wget \
    https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh \
    && mkdir /root/.conda \
    && bash Miniconda3-latest-Linux-x86_64.sh -b \
    && rm -f Miniconda3-latest-Linux-x86_64.sh
ENV PATH="/root/miniconda3/bin:${PATH}"

# Install bioconda
RUN conda config --add channels defaults \
    && conda config --add channels bioconda \
    && conda config --add channels conda-forge

# Install parsnp in conda environment
RUN conda create -n parsnp python=3.7 parsnp=$PARSNP_VER
ENV PATH="/root/miniconda3/envs/parsnp/bin:${PATH}"

WORKDIR data/

FROM app as test

# Download some test data
RUN wget \
        https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/703/365/GCA_000703365.1_Ec2011C-3609/GCA_000703365.1_Ec2011C-3609_genomic.fna.gz \
        https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/016/766/575/GCA_016766575.1_PDT000040717.5/GCA_016766575.1_PDT000040717.5_genomic.fna.gz \
        https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/003/018/935/GCA_003018935.1_ASM301893v1/GCA_003018935.1_ASM301893v1_genomic.fna.gz \
        https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/012/830/055/GCA_012830055.1_PDT000040719.3/GCA_012830055.1_PDT000040719.3_genomic.fna.gz \
        https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/012/829/335/GCA_012829335.1_PDT000040724.3/GCA_012829335.1_PDT000040724.3_genomic.fna.gz \
        https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/003/018/775/GCA_003018775.1_ASM301877v1/GCA_003018775.1_ASM301877v1_genomic.fna.gz \
        https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/012/829/275/GCA_012829275.1_PDT000040726.3/GCA_012829275.1_PDT000040726.3_genomic.fna.gz \
        https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/016/766/555/GCA_016766555.1_PDT000040728.5/GCA_016766555.1_PDT000040728.5_genomic.fna.gz \
        https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/012/829/195/GCA_012829195.1_PDT000040729.3/GCA_012829195.1_PDT000040729.3_genomic.fna.gz \
        https://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/012/829/295/GCA_012829295.1_PDT000040727.3/GCA_012829295.1_PDT000040727.3_genomic.fna.gz && \
    gunzip ./*.gz

# Run ParSNP on test data
RUN parsnp \
    -d /data \
    -o  /data/parsnp_results \
    --use-fasttree \
    -v \
    -c \
    -r /data/GCA_000703365.1_Ec2011C-3609_genomic.fna

# Extract SNP alignment
RUN harvesttools -i /data/parsnp_results/parsnp.ggr -S /data/parsnp_results/snp_alignment.txt

# Generate XMFA file checksum
RUN sha256sum /data/parsnp_results/parsnp.xmfa > /data/parsnp_results/parsnp.xmfa.checksum

# Print test data results
RUN cat /data/parsnp_results/snp_alignment.txt
RUN cat /data/parsnp_results/parsnp.xmfa.checksum

To reproduce the issue, you would need docker installed and running, save the Dockerfile, and run:

docker build --progress=plain -f ./Dockerfile

That should install ParSNP via conda, download some test assemblies, run ParSNP on them, and cat the SNP alignment and XMFA file sha-256 checksum out to you.

When I do this on my laptop (MacOS Monteray) I get
Alignment length: 429
sha-256 checksum: 974f614d3e78b8964552b5830ff9d275e720f05e40ca969b2d94c64caf213df3.

When I build the same dockerfile using GitHub actions I get
Alignment length: 411
sha-256 checksum: b5e8ced8ca4837e7ffe067b1dcd1bd00c2ce5f95d0918701dc88222d449e8bd4

Clearing the caches and re-building the image again on both systems yields the same checksums.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants