Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

umi_tools extract: UMI on 5' read 1 and 3' read 2 #623

Open
lzt5269 opened this issue Jan 15, 2024 · 2 comments
Open

umi_tools extract: UMI on 5' read 1 and 3' read 2 #623

lzt5269 opened this issue Jan 15, 2024 · 2 comments

Comments

@lzt5269
Copy link

lzt5269 commented Jan 15, 2024

Hi,

I'm working on paired-end data. Read 1 has 10 UMI at the beginning and read 2 has 10 UMI which is reverse complement to UMI on read 1 at the end. How should I extract UMI and remove them from both reads?

Thanks.

@lzt5269 lzt5269 changed the title umi_tools extract: UMI on 5 umi_tools extract: UMI on 5' read 1 and 3' read 2 Jan 15, 2024
@TomSmithCGAT
Copy link
Member

Hi @lzt5269,

Sorry for the slow reply on this one. This is outside the expected functionality of UMI-tools, but I think you can acheive this with the following, which uses regex pattern matching that takes longer for simple UMI extractions, but allows more flexibility. Here, we specify that the UMI of read 1 is 10 characters (bases) at the start (--bc-pattern='(?P<umi_1>.{10}).*'). For read 2, we give a pattern that doesn't include any UMI group, just a group to discard, which is the last 10 bases (--bc-pattern2='.*(?P<discard_1>.{10})').

umi_tools extract 
--extract-method=regex
--bc-pattern='(?P<umi_1>.{10}).*'
--bc-pattern2='.*(?P<discard_1>.{10})'
-L test.log
--read2-in=<PATH TO READ2 FILE>
--stdin=<PATH TO READ1 FILE>
--read2-out=<PATH TO READ2 OUTFILE> |
gzip > <PATH TO READ1 OUTFILE>

I recommend manually check that the above is giving you the expected output for the first read pair.

Of course, the ideal solution would be to use the two UMIs to correct any sequencing errors in them and obtain a consensus UMI sequence. I expect it's probably little benefit for the effort required however.

@TomSmithCGAT
Copy link
Member

@lzt5269 - Did the above work?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants