SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Bias in unique molecular identifier usage sudders Bioinformatics 5 11-23-2015 02:30 AM
Binning expression data along with capturing gene identifiers alan_sm RNA Sequencing 0 01-26-2015 06:35 AM
Doubts about GATK "raw data processing" step for SOliD exome data jorgebm Bioinformatics 2 06-18-2012 06:17 AM

Reply
 
Thread Tools
Old 07-30-2015, 07:01 AM   #1
donquijotes
Junior Member
 
Location: Michigan

Join Date: Jul 2015
Posts: 7
Default Help with UMI (unique molecular identifiers) data processing

I've been browsing different papers and publications and trying to figure out what's the best way to analyze data with UMIs.
So far I have used GATK to do some analysis couple of times, but other than that I was mostly playing with alternative splicing analysis so I'm rather new to this CNV calling with UMIs topic and area of research.
What I would like to do is have the following design.

adaptor-UMI-DNAlibraryINSERT-UMI-adaptor

The UMIs will be 5 random bases on each side.

I get the whole UMI thinking and analysis but what I haven't found yet is the software to do such analysis. I've seen few tools to mark/find UMIs and put them on the header of the fastq sequence but then what? How do you bin and get rid of the true PCR duplicates? Does Picard have a function for it? If I have to write my own code then I'm out of luck lol.

I know Agilent supports UMIs with their Haloplex HS kits and their Surecall software that is mostly (from what I've heard) a nice GATK GUI.

Any help and guidance would be much appreciated. Newbies have the right to learn too, right?

Thank you in advance
donquijotes is offline   Reply With Quote
Old 07-30-2015, 05:24 PM   #2
nucacidhunter
Senior Member
 
Location: Iran

Join Date: Jan 2013
Posts: 1,080
Default

Product described in web page below uses Molecular Indexing and the sequences are given in product manual.
http://www.biooscientific.com/Next-G...x-qRNA-Seq-Kit

They have described analysis step in a link in page below:
http://www.biooscientific.com/Next-G...x-qRNA-Seq-Kit
nucacidhunter is offline   Reply With Quote
Old 09-29-2015, 06:40 AM   #3
charlescoldroom
Junior Member
 
Location: Leuven, Belgium

Join Date: Apr 2012
Posts: 8
Default

I am also interested to know about how to handle UMIs and remove duplicated reads based on UMIs.

I am using modified primers to have amplicon pools.

Which tools are there to mark/find UMIs and put them on the header of the fastq sequence? How could I then process the reads?

I have tried looking around, but I could not find any good step-by-step explanation, even papers just mention that they do the analysis but do not explain how.

Thanks!
charlescoldroom is offline   Reply With Quote
Old 10-08-2015, 06:08 PM   #4
danwiththeplan
Member
 
Location: Auckland

Join Date: Sep 2011
Posts: 72
Default Molecular indexing

Hi, you could try the script mentioned here:

http://www.biooscientific.com/Portal...A-Analysis.pdf

It's currently not working for me, but I'm in communication with the maintainer so I'll repost if I get everything working.
danwiththeplan is offline   Reply With Quote
Old 10-08-2015, 06:15 PM   #5
luc
Senior Member
 
Location: US

Join Date: Dec 2010
Posts: 308
Default

A very simple approach would be to do a general de-duplification of the reads with BBTools (I have not used it for thispurpose but it should be better than our in house script) which will likely require a considerable memory. Then you should trim the 5 random bases.
luc is offline   Reply With Quote
Old 10-12-2015, 06:48 AM   #6
charlescoldroom
Junior Member
 
Location: Leuven, Belgium

Join Date: Apr 2012
Posts: 8
Default

Thanks guys, I will check out the suggestions!
charlescoldroom is offline   Reply With Quote
Old 11-23-2015, 02:21 AM   #7
IanSudbery
Junior Member
 
Location: Boston, MA

Join Date: May 2011
Posts: 1
Default

I know this most is a few months old now, but you might like to try our UMI-tools package, which offers a range different algorithms for deduplicating UMI sequences.

https://github.com/CGATOxford/UMI-tools
IanSudbery is offline   Reply With Quote
Old 11-23-2015, 01:26 PM   #8
danwiththeplan
Member
 
Location: Auckland

Join Date: Sep 2011
Posts: 72
Default

Quote:
Originally Posted by IanSudbery View Post
I know this most is a few months old now, but you might like to try our UMI-tools package, which offers a range different algorithms for deduplicating UMI sequences.

https://github.com/CGATOxford/UMI-tools
Hi, thanks for this contribution..

I'm reading the code, and this is what it looks like to me, but am I correct in saying that this script would correctly deduplicate splice-aware mappings ? i.e. reads that jump across splice boundaries are handled correctly?
danwiththeplan is offline   Reply With Quote
Old 03-14-2016, 03:22 AM   #9
sudders
Member
 
Location: Sheffield, UK

Join Date: Dec 2011
Posts: 32
Default

Quote:
Originally Posted by danwiththeplan View Post
Hi, thanks for this contribution..

I'm reading the code, and this is what it looks like to me, but am I correct in saying that this script would correctly deduplicate splice-aware mappings ? i.e. reads that jump across splice boundaries are handled correctly?

You've probably worked this out already, but yes, it handles splice-aware mappings.
sudders is offline   Reply With Quote
Old 05-10-2017, 01:16 PM   #10
medalofhonour
Member
 
Location: Brighton

Join Date: Jul 2011
Posts: 18
Default

This group recently published a paper with a pipeline for analyzing UMI datasets. The software can be found here :

https://github.com/mikessh/mageri
medalofhonour is offline   Reply With Quote
Old 10-30-2017, 08:41 AM   #11
cement_head
Senior Member
 
Location: Oxford, Ohio

Join Date: Mar 2012
Posts: 188
Default

If you are using CLC Genomics Workbench:

https://www.qiagenbioinformatics.com...cular-indexing
cement_head is offline   Reply With Quote
Old 11-07-2017, 01:40 AM   #12
Strandlife
Member
 
Location: All over the world

Join Date: May 2013
Posts: 59
Default

You should try Strand NGS for UMI protocols.
Strand NGS is the only software to provide comprehensive and end-to-end support for multi Unique Molecular Identifier Protocols

Few features includes:

1. Protocol diversity. Strand NGS supports data analysis from UMI protocols
i. Qiagen GeneRead®
ii. Archer VariantPlex®
iii. Rubicon Thruplex®
iv. Bioo Scientific NextFlex®)
v. A robust interface to specify custom UMIs

2. End-to-end or point-to-point. Users can go from reads to variants, can start at aligned BAMs containing the BC tag, or start/end at any reasonable point in the alignment/analysis workflow.

3. Workflow diversity. Strand NGS supports UMI protocols in DNA-, RNA- and small RNA-Seq workflows

4. Somatic- and UMI-ready visualizations. The genome browser visualizes consensus read lists. Each read contains UMI-related metadata, such as family size, UMI and mate UMI. A filter allows the easy exclusion of wild-type reads. This is useful at high sequencing depths and low allele frequencies, typical of data from somatic/tumor samples.

You could get a 20-day free trial by registering here with your organization email id:
http://www.strand-ngs.com/signup/freetrial
Strandlife is offline   Reply With Quote
Old 12-01-2017, 11:59 PM   #13
[email protected]
Member
 
Location: Shenzhen, China

Join Date: Aug 2015
Posts: 15
Default

You can use fastp to preprocess UMI from fastq.
__________________
OpenGene(Libraries and tools for NGS data analysis),AfterQC(Fastq Filtering and QC)
FusionDirect.jl( Detect gene fusion), SeqMaker.jl(Next Generation Sequencing simulation)
chen@haplox.com is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 09:38 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2017, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO