Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle SSR differently depending on the structure of the SSR marker #15

Open
thokall opened this issue Jun 24, 2024 · 0 comments
Open

Comments

@thokall
Copy link
Contributor

thokall commented Jun 24, 2024

Scoring SSR markers from NGS data has to be done differently depending on the structure of the data. There are in essence three different possibilities that is seen on real world data.

  1. The amplified and sequenced marker is shorter than read length, which means that paired reads will individually hold all necessary information.
  2. The amplified and sequenced marker is longer than the individual read length, but overlap between paired reads can be used to create merged reads that can be handled as the first type.
  3. The amplified and sequenced marker is longer than the paired read length so that it can not be merged.

The first two can be analyzed be simply merging the read data and then estimate read length on the merged data. It is the preferred option if one is designing new markers as it will be possible to analyse in most cases. The only real corner case is cases when the overlap for merging reads only contains the repetitive region as it makes it impossible to merge in a correct fashion.

The third case is more challenging as it will only be possible to determine read length if the repeat region is contained within the content of any of paired reads.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant