Handle SSR differently depending on the structure of the SSR marker #15

thokall · 2024-06-24T10:39:47Z

Scoring SSR markers from NGS data has to be done differently depending on the structure of the data. There are in essence three different possibilities that is seen on real world data.

The amplified and sequenced marker is shorter than read length, which means that paired reads will individually hold all necessary information.
The amplified and sequenced marker is longer than the individual read length, but overlap between paired reads can be used to create merged reads that can be handled as the first type.
The amplified and sequenced marker is longer than the paired read length so that it can not be merged.

The first two can be analyzed be simply merging the read data and then estimate read length on the merged data. It is the preferred option if one is designing new markers as it will be possible to analyse in most cases. The only real corner case is cases when the overlap for merging reads only contains the repetitive region as it makes it impossible to merge in a correct fashion.

The third case is more challenging as it will only be possible to determine read length if the repeat region is contained within the content of any of paired reads.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle SSR differently depending on the structure of the SSR marker #15

Handle SSR differently depending on the structure of the SSR marker #15

thokall commented Jun 24, 2024

Handle SSR differently depending on the structure of the SSR marker #15

Handle SSR differently depending on the structure of the SSR marker #15

Comments

thokall commented Jun 24, 2024