umi_tools extract: UMI on 5' read 1 and 3' read 2 #623

lzt5269 · 2024-01-15T17:05:39Z

Hi,

I'm working on paired-end data. Read 1 has 10 UMI at the beginning and read 2 has 10 UMI which is reverse complement to UMI on read 1 at the end. How should I extract UMI and remove them from both reads?

Thanks.

TomSmithCGAT · 2024-02-19T10:13:00Z

Hi @lzt5269,

Sorry for the slow reply on this one. This is outside the expected functionality of UMI-tools, but I think you can acheive this with the following, which uses regex pattern matching that takes longer for simple UMI extractions, but allows more flexibility. Here, we specify that the UMI of read 1 is 10 characters (bases) at the start (--bc-pattern='(?P<umi_1>.{10}).*'). For read 2, we give a pattern that doesn't include any UMI group, just a group to discard, which is the last 10 bases (--bc-pattern2='.*(?P<discard_1>.{10})').

umi_tools extract 
--extract-method=regex
--bc-pattern='(?P<umi_1>.{10}).*'
--bc-pattern2='.*(?P<discard_1>.{10})'
-L test.log
--read2-in=<PATH TO READ2 FILE>
--stdin=<PATH TO READ1 FILE>
--read2-out=<PATH TO READ2 OUTFILE> |
gzip > <PATH TO READ1 OUTFILE>

I recommend manually check that the above is giving you the expected output for the first read pair.

Of course, the ideal solution would be to use the two UMIs to correct any sequencing errors in them and obtain a consensus UMI sequence. I expect it's probably little benefit for the effort required however.

TomSmithCGAT · 2024-03-04T14:05:38Z

@lzt5269 - Did the above work?

lzt5269 changed the title ~~umi_tools extract: UMI on 5~~ umi_tools extract: UMI on 5' read 1 and 3' read 2 Jan 15, 2024

TomSmithCGAT closed this as completed Mar 4, 2024

TomSmithCGAT reopened this Mar 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

umi_tools extract: UMI on 5' read 1 and 3' read 2 #623

umi_tools extract: UMI on 5' read 1 and 3' read 2 #623

lzt5269 commented Jan 15, 2024 •

edited

Loading

TomSmithCGAT commented Feb 19, 2024

TomSmithCGAT commented Mar 4, 2024

umi_tools extract: UMI on 5' read 1 and 3' read 2 #623

umi_tools extract: UMI on 5' read 1 and 3' read 2 #623

Comments

lzt5269 commented Jan 15, 2024 • edited Loading

TomSmithCGAT commented Feb 19, 2024

TomSmithCGAT commented Mar 4, 2024

lzt5269 commented Jan 15, 2024 •

edited

Loading