Skip to content

CmdLine API Documentation

Daniel León-Perinán edited this page Jul 10, 2024 · 4 revisions

Resaving (st-resave)

Resave (compressed) textfiles to one of the layouts described above (and optionally --normalize) using

st-resave \
     -i '/Puck_180528_20.tar/BeadLocationsForR.csv,/Puck_180528_20.tar/MappedDGEForR.csv,Puck_180528_20.n5' \
     -a '/path/celltypes.csv' \
     -a ...
     [-c '/path/directory.n5'] \
     [--normalize]

If the N5-container exists, new datasets will be added (example above:Puck_180528_20.n5), otherwise a new N5-container will be created. Each input consists of a locations.csv file, a reads.csv file, and a user-defined dataset name. The csv files can optionally be inside (zip/tar/tar.gz) files. It is tested on the slide-seq data linked above, which can be used as a blueprint for how to save one's own data for import.

Optionally, one or more annotations (e.g., cell types) can be imported as part of the resaving step (e.g. from celltypes.csv) with the -a flag. Please note that missing barcodes in celltypes.csv will be excluded from the dataset. This way you can filter locations with bad expression values.

Optionally, the datasets can be directly log-normalized before resaving. The locations file should contain a header for barcode (id), xcoord and ycoord, followed by the entries:

barcodes,xcoord,ycoord
TCACGTAGAAACC,3091.01234567901,2471.88888888889
TCTCCTAGTTCGG,4375.91791044776,1577.52985074627
...

The reads file should contain all barcodes (id) as the header after a Row column that holds the gene identifier. It should have as many columns as there are sequenced locations (ids from above):

Row,TCACGTAGAAACC,TCTCCTAGTTCGG, ...
0610005C13Rik,0,0, ...
0610007P14Rik,0,0, ...
...

Note: if there is a mismatch between number of sequenced locations defined in the locations.csv (rows) with the locations in reads.csv (columns), the resave will stop.

The optional annotation files should contain all barcodes (id) and celltype id (integer numbers) as a header:

barcodes,celltype
TCACGTAGAAACC,28
TCTCCTAGTTCGG,1
ACCGTCTGAATTC,40
...

Adding annotations (st-add-annotations)

You can also add CSV annotations (e.g., celltypes) to an existing dataset (within or outside some N5-container):

st-add-annotations \
     -i '/path/input.n5' \
     -a '/path/celltypes.csv' \
     [-l 'label']

The annotations are stored in the dataset within the intended group as label if the -l option is given, otherwise the label is taken from the file name (in the above case, celltypes). Note that this command does not act upon missing barcodes, but only warns about them.

Normalization (st-normalize)

You can run the normalization also independently after resaving if desired. The tool can resave datasets within or outside of an N5-container:

st-normalize \
     -i '/path/input1.n5,/path/input2.n5' \
     [-o '/path/output1.n5,/path/output1.n5'] \
     [-c '/path/container.n5'] \

The only parameter you have to provide is the comma separated list of input datasets -i which are assumed to reside in an N5-container if additionally the -c option is given. You can optionally define a comma separated list of output paths -o (otherwise it'll append '-normed' to the dataset names).

Adding gene variability (st-add-entropy)

You can compute gene expression variability metrics (e.g., standard deviation) before running pairwise alignment.

st-add-entropy \
     -i='/path/directory.n5/Puck_180528_20.h5ad' \
     [-gl='stdev'] \
     [-m='stdev'] \
     [--numThreads=8]

These values will be stored as per-gene annotations in the dataset within the intended group as stdev if the -gl option is given, otherwise the label is taken from the method (in the above case, stdev).

Iteractive viewing application (st-explorer)

Run the interactive viewer as follows

st-explorer \
     -i '/path/directory.n5' \
     [-d 'Puck_180528_20.h5ad,Puck_180528_22.h5ad']

It allows you to browse the data in realtime for all genes and datasets. If data is registered it will automatically use the transformations that are stored in the metadata to properly overlay individual datasets. The optional switch -d allows you to select a subset of datasets if -i is an N5-container, and using -c allows to preset the BigDataViewer intensity range.

Render images and view or save as TIFF (st-render)

In order to render images of spatial sequencing datasets (can be saved as TIFF or displayed on screen using ImageJ) please run

st-render \
    -i='/path/directory.n5' \
    -g='Calm2,Hpca,Ptgds' \
    [-s=0.1] \
    [-o='/path/exportdir'] \
    [-d='Puck_180528_20,Puck_180528_22'] \
    [-bmax=0.5] \
    [-bmin=0] \
    [--ffGauss=2.0] \
    [--ffMean=2.5] \
    [--ffMedian=5.0] \
    [--ffSingleSpot=1.5] \
    [--rendering=Gauss] \
    [-rf=1.5] \
    [-b=50] \
    [--ignoreTransforms]

If you only define the input path -i and one or more genes -g, the rendered image will be displayed as an ImageJ image. If the input is an N5-container, all datasets will be rendered as 3D image. When defining an output directory -o images will not be displayed, but saved as TIFF (stacks) into the directory with filenames corresponding to the gene name. The optional switch -d allows you to select a subset of datasets if -i is an N5-container (default: all datasets), -s scales the rendering (default: 0.05), --ffSingleSpot enables a single-spot filter (default: off), --ffMedian=5.0 applies median filtering in locations space (not on top of the rendered image) with a certain radius (default: off), -rf sets the smoothness factor for rendering of the sparse dataset, and -b sets the size of an extra black border around the location coordinates (default: 20). Finally, --ignoreTransforms lets you ignore all transforms associated with the datasets (e.g., alignment) when rendering.

View selected genes using BigDataViewer (st-bdv-view)

In order to interactively browse the 2D space of one or more datasets with BigDataViewer you can

st-bdv-view \
     -i '/path/directory.n5' \
     -g Calm2,Hpca \
     -d 'Puck_180528_20.n5' \
     [-a 'celltype'] \
     [-ar=0.75] \
     [-bmax=0.5] \
     [-bmin=0] \
     [--rendering=Gauss] \
     [-rf 2.0] \
     [--ffGauss=2.0] \
     [--ffMean=2.5] \
     [--ffMedian=5.0] \
     [--ffSingleSpot=1.5]

Alternatively, you can browse the 3D space of a dataset with:

st-bdv-view3d \
     -i '/path/directory.n5' \
     -g Calm2,Hpca \
     [-d 'Puck_180528_20.n5,Puck_180528_22.n5'] \
     [-z 5.0] \
     [-a 'celltype'] \
     [-ar=0.75] \
     [-bmax=0.5] \
     [-bmin=0] \
     [--rendering=Gauss] \
     [-rf 2.0] \
     [--ffGauss=2.0] \
     [--ffMean=2.5] \
     [--ffMedian=5.0] \
     [--ffSingleSpot=1.5]

Dataset(s) from the selected input -i (single dataset or N5-container) will be interactively rendered for one or more selected genes -g (multiple genes will be overlaid into different colors). The switch -a will overlay for example celltype annotations. By default all datasets will be displayed, but they can be limited (or ordered) using -d. You can define the distance between slices with -z (as a factor of median spacing between sequenced locations), -c allows to preset the BigDataViewer intensity range and parameters -f, -m, -rf are explained above (4).

Alignment of 2D slices

The alignment of 2D slices of a 3D volume is a two-step process. At first, using st-align-pairs slices will be aligned pairwise (e.g. 1st vs 2nd, 1st vs 3rd, and so on ...) using the Scale Invariant Feature Transform (SIFT) on a set of genes. These pairwise alignments can optionally be viewed and confirmed using st-align-pairs-view. Finally, a globally optimal model for each slide will computed using st-align-global, which supports a refinement using Iterative Closest Point (ICP) matching. Note: the alignment process inherently requires multiple datasets and additional metadata to be stored. Therefore, the following commands can only be used with an N5-container.

Pairwise alignment (st-align-pairs)

The pairwise alignment uses SIFT to align pairs of 2d slices. Important note: the order of the datasets as they are passed into the program is crucial as it determines which slices are next to each other. If not specified, they are used in the order as stored in the JSON file inside the N5-container. The 2d alignment can be called as follows, the resulting transformations and corresponding points are automatically stored in the N5:

st-align-pairs \
    -i='/path/directory.n5' \
    [-d='Puck_180528_20.n5,Puck_180528_22.n5'] \
    [-r=2] \
    [-g='Calm2,Hpca'] \
    [-n=100] \
    [-sk=10] \
    [-s=0.05] \
    [-rf=4.0] \
    [--overwrite] \
    [-e=250.0] \
    [--minNumInliersGene=5] \
    [--minNumInliers=30] \
    [-bmax=0.5] \
    [-bmin=0] \
    [--rendering=Gauss] \
    [--ffGauss=2.0] \
    [--ffMean=2.5] \
    [--ffMedian=5.0] \
    [--ffSingleSpot=1.5] \
    [--hidePairwiseRendering]

Datasets from the selected N5-container -i will be aligned in pairs. Datasets and their ordering can be optionally defined using -d, otherwise all datasets will be used in the order as defined in the N5-container. The comparison range (±slices to be aligned) can be defined using -r, by default it is set to 2. Genes to be used can be specified manually using -g, or a specified number of genes -n with the highest standard deviation in the expression signal will be used. By default, 100 genes will be automatically selected.

The images used for alignment are rendered as in the viewing programs above. The scaling of the images can be changed using -s (default: 0.05 or 5%), and the smoothness factor can be changed using -rf (default: 4.0). If a registration was run before and transformations are already stored, the application will quit. To compute anyways, previous results can be overwritten using --overwrite.

The alignment itself has more paramters that can be adjusted. The maximal error (default 250.0) for the RANSAC matching in SIFT can be adjusted using -e, the minimally required number of RANSAC inliers per tested gene can be changed using --minNumInliersGene (default: 5), and the minimal number of inliers over all genes can be adjusted using --minNumInliers (default: 30).

The results of the alignment will be shown by default using a gene (selected automatically or defined via --renderingGene, which can be deactivated using --hidePairwiseRendering.

Manually adding matches (st-align-pairs-add)

To manually add point matches between two slices (e.g., computed from an external program), you can use the following command:

st-align-pairs-add \
     -c '/path/directory.n5' \
     -d 'Puck_180528_20.n5,Puck_180528_22.n5' \
     -m '/path/matches.csv' \
     [--overwrite] \
     [--hidePairwiseRendering] \
     [--renderingGene Calm2] \
     [-s 0.05] \
     [-rf 4.0] \
     [-l 1.0]

The two datasets -d must be present in the selected N5-container -c. A CSV file -m with the point matches must be provided, which should contain rows of the following shape:

gene,x1,y1,x2,y2

The entries represent the gene name and the x/y coordinates of the point matches in the two datasets; note that the order of the datasets is given by the order in the -d argument.

The results of the alignment will be shown by default using a gene (selected automatically or defined via --renderingGene, which can be deactivated using --hidePairwiseRendering.

The images used for alignment are rendered as in the viewing programs above. The scaling of the images can be changed using -s (default: 0.05 or 5%), and the smoothness factor can be changed using -rf (default: 4.0). If point matches for a particular pair are already present, the application will quit. To store them anyways, use --overwrite.

Interactive alignment (st-align-interactive)

To interactively align two datasets using a Graphical User Interface (GUI), you can use the following command:

st-align-interactive \
     -i='/path/directory.n5' \
     [-d1='Puck_180528_20.n5'] \
     [-d2='Puck_180528_22.n5'] \
     [-n=10] \
     [-sk=10] \
     [-s=0.05] \
     [-rf=1.0] \
     [-bmax=0.5] \
     [-bmin=1] \
     [--ffGauss=2.0] \
     [--ffMean=2.5] \
     [--ffMedian=5.0] \
     [--ffSingleSpot=1.5] \
     [--rendering=Gauss]

Two datasets -d1 and -d2 in the selected N5-container -i will be aligned in pairs, interactively. A specified number of genes -n with the highest standard deviation in the expression signal will be used. Later, more genes can be selected interactively.

The images used for alignment are rendered as in the viewing programs above. The scaling of the images can be changed using -s (default: 0.05 or 5%), and the smoothness factor can be changed using -rf. The rest of parameters can also be configured using the GUI.

Alignment can be performed manually (with the mouse) or automatically with SIFT (with optional ICP refinement). During automatic alignment, all selected genes are used for feature detection and matching. Alignment with SIFT (and ICP) itself have more parameters that can be adjusted in the GUI.

The results of alignment are automatically shown for the selected gene, and the transformation matrix can be stored to the container.

View pairwise alignment (st-align-pairs-view)

This command allows to manually inspect pairwise alignments between slices and to test out the effect of different transformation models (from fully rigid to fully affine). It uses all identified corresponding points to compute the respective transformation that minimizes the distance between all points.

st-align-pairs-view \
     -c '/path/directory.n5' \
     -g Calm2 \
     [-d 'Puck_180528_20.n5,Puck_180528_22.n5'] \
     [-s 0.05] \
     [-rf 4.0] \
     [-l 1.0] \

Pairs of datasets -d from the selected N5-container -c will be visualized for a gene of choice defined by -g. If -d is omitted, all pairs will be displayed. The images are rendered as explained above. The scaling of the images can be changed using -s (default: 0.05 or 5%), and the smoothness factor can be changed using -rf (default: 4.0). Importantly, -l allows to set the lambda of the 2D interpolated transformation model(s) (affine/rigid). Specifically, lambda defines the degree of rigidity, fully affine is 0.0, fully rigid is 1.0 (default: 1.0 - rigid). A sensible choice might be 0.1.

Global optimization and ICP refinement (st-align-global)

The global optimization step minimizes the distance between all corresponding points across all pairs of slices (at least two) and includes an optional refinement step using the iterative closest point (ICP) algorithm.

st-align-global \
    -c='/path/directory.n5' \
    [-d='Puck_180528_20.n5,Puck_180528_22.n5'] \
    [-l=0.1] \
    [--ignoreQuality] \
    [--maxAllowedError=300.0] \
    [--minIterations=500] \
    [--maxIterations=3000] \
    [--relativeThreshold=3.0] \
    [--absoluteThreshold=160.0] \
    [--skipICP] \
    [--icpIterations=100] \
    [--icpErrorFraction=1.0] \
    [--maxAllowedErrorICP=140.0] \
    [--minIterationsICP=500] \
    [--maxIterationsICP=500] \
    [--skipDisplayResults] \
    [-rf=4.0] \
    [-g='Calm2']

By default, all datasets of the specified N5-container -c will be optimized, a subset of datasets can be selected using -d. -l allows to set the lambda of the 2D interpolated transformation model(s) that will be used for each slice. Lambda defines the degree of rigidity, fully affine is 0.0, fully rigid is 1.0 (default: 0.1 - 10% rigid, 90% affine).

Prior to computing the final optimum, we try to identify if there are pairs of slices that contain wrong correspondences. To do this, we test for global consistency of the alignment and potentially remove pairs that differ significantly from the consensus of all the other pairs. There are a few parameters to adjust this process. --ignoreQuality ignores the amount of RANSAC inlier ratio as a way to measure their quality, otherwise it is used determine which pairwise connections to remove during global optimization (default: false). --relativeThreshold sets the relative threshold for dropping pairwise connections, i.e. if the pairwise error is n-times higher than the average error (default: 3.0). --absoluteThreshold defines the absolute error threshold for dropping pairwise connections. The errors of the pairwise matching process provide a reasonable number, the global error shouldn't be much higher than the pairwise errors, althought it is expected to be higher since it is a more constraint problem (default: 160.0 for slideseq).

--maxAllowedError specifies the maximally allowed error during global optimization (default: 300.0 for slideseq). The optimization will run until the maximum number of iterations --maxIterations if the error remains above --maxAllowedError. --minIterations sets the minimum number of iterations that will be performed. Note: These parameters usually do not need to change.

--skipICP skips the more compute intense ICP refinement step. If sufficent numbers of correspondences are found in the pairwise matching (e.g. >300), this can be advisable. --icpIterations defines the maximum number of ICP iterations for each pair of slides (default: 100). --icpErrorFraction describes the distance at which sequenced locations will be assigned as correspondences in ICP, relative to median distance between all locations (default: 1.0). --maxAllowedErrorICP is the maximum error allowed during ICP runs (after each model fit) - here also consult the results of pairwise matching to identify a reasonable number (default: 140.0 for slideseq).

The global optimization after ICP will run until the maximum number of iterations --maxIterationsICP if the error remains above --maxAllowedErrorICP. --minIterationsICP sets the minimum number of iterations that will be performed. Note: These parameters usually do not need to change.

The results are displayed by default. The smoothness factor can be changed using -rf (default: 4.0), the gene can be selected using -g (default: Calm2).