Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Help with UMI (unique molecular identifiers) data processing

    I've been browsing different papers and publications and trying to figure out what's the best way to analyze data with UMIs.
    So far I have used GATK to do some analysis couple of times, but other than that I was mostly playing with alternative splicing analysis so I'm rather new to this CNV calling with UMIs topic and area of research.
    What I would like to do is have the following design.

    adaptor-UMI-DNAlibraryINSERT-UMI-adaptor

    The UMIs will be 5 random bases on each side.

    I get the whole UMI thinking and analysis but what I haven't found yet is the software to do such analysis. I've seen few tools to mark/find UMIs and put them on the header of the fastq sequence but then what? How do you bin and get rid of the true PCR duplicates? Does Picard have a function for it? If I have to write my own code then I'm out of luck lol.

    I know Agilent supports UMIs with their Haloplex HS kits and their Surecall software that is mostly (from what I've heard) a nice GATK GUI.

    Any help and guidance would be much appreciated. Newbies have the right to learn too, right?

    Thank you in advance

  • #2
    Product described in web page below uses Molecular Indexing and the sequences are given in product manual.
    http://www.biooscientific.com/Next-G...x-qRNA-Seq-Kit

    They have described analysis step in a link in page below:
    http://www.biooscientific.com/Next-G...x-qRNA-Seq-Kit

    Comment


    • #3
      I am also interested to know about how to handle UMIs and remove duplicated reads based on UMIs.

      I am using modified primers to have amplicon pools.

      Which tools are there to mark/find UMIs and put them on the header of the fastq sequence? How could I then process the reads?

      I have tried looking around, but I could not find any good step-by-step explanation, even papers just mention that they do the analysis but do not explain how.

      Thanks!

      Comment


      • #4
        Molecular indexing

        Hi, you could try the script mentioned here:



        It's currently not working for me, but I'm in communication with the maintainer so I'll repost if I get everything working.

        Comment


        • #5
          A very simple approach would be to do a general de-duplification of the reads with BBTools (I have not used it for thispurpose but it should be better than our in house script) which will likely require a considerable memory. Then you should trim the 5 random bases.

          Comment


          • #6
            Thanks guys, I will check out the suggestions!

            Comment


            • #7
              I know this most is a few months old now, but you might like to try our UMI-tools package, which offers a range different algorithms for deduplicating UMI sequences.

              Comment


              • #8
                Originally posted by IanSudbery View Post
                I know this most is a few months old now, but you might like to try our UMI-tools package, which offers a range different algorithms for deduplicating UMI sequences.

                https://github.com/CGATOxford/UMI-tools
                Hi, thanks for this contribution..

                I'm reading the code, and this is what it looks like to me, but am I correct in saying that this script would correctly deduplicate splice-aware mappings ? i.e. reads that jump across splice boundaries are handled correctly?

                Comment


                • #9
                  Originally posted by danwiththeplan View Post
                  Hi, thanks for this contribution..

                  I'm reading the code, and this is what it looks like to me, but am I correct in saying that this script would correctly deduplicate splice-aware mappings ? i.e. reads that jump across splice boundaries are handled correctly?

                  You've probably worked this out already, but yes, it handles splice-aware mappings.

                  Comment


                  • #10
                    This group recently published a paper with a pipeline for analyzing UMI datasets. The software can be found here :

                    MAGERI - Assemble, align and call variants for targeted genome re-sequencing with unique molecular identifiers - mikessh/mageri

                    Comment


                    • #11
                      If you are using CLC Genomics Workbench:

                      Comment


                      • #12
                        You should try Strand NGS for UMI protocols.
                        Strand NGS is the only software to provide comprehensive and end-to-end support for multi Unique Molecular Identifier Protocols

                        Few features includes:

                        1. Protocol diversity. Strand NGS supports data analysis from UMI protocols
                        i. Qiagen GeneRead®
                        ii. Archer VariantPlex®
                        iii. Rubicon Thruplex®
                        iv. Bioo Scientific NextFlex®)
                        v. A robust interface to specify custom UMIs

                        2. End-to-end or point-to-point. Users can go from reads to variants, can start at aligned BAMs containing the BC tag, or start/end at any reasonable point in the alignment/analysis workflow.

                        3. Workflow diversity. Strand NGS supports UMI protocols in DNA-, RNA- and small RNA-Seq workflows

                        4. Somatic- and UMI-ready visualizations. The genome browser visualizes consensus read lists. Each read contains UMI-related metadata, such as family size, UMI and mate UMI. A filter allows the easy exclusion of wild-type reads. This is useful at high sequencing depths and low allele frequencies, typical of data from somatic/tumor samples.

                        You could get a 20-day free trial by registering here with your organization email id:
                        Strand NGS is Next generation sequencing data analysis tool. Supports DNA-Seq, RNA-Seq, ChIP-Seq, Methyl-Seq, MeDIP-Seq, small RNA-Seq, pathway analysis, downstream analysis

                        Comment


                        • #13
                          You can use fastp to preprocess UMI from fastq.
                          OpenGene(Libraries and tools for NGS data analysis),AfterQC(Fastq Filtering and QC)
                          FusionDirect.jl( Detect gene fusion), SeqMaker.jl(Next Generation Sequencing simulation)

                          Comment

                          Latest Articles

                          Collapse

                          • seqadmin
                            Strategies for Sequencing Challenging Samples
                            by seqadmin


                            Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                            03-22-2024, 06:39 AM
                          • seqadmin
                            Techniques and Challenges in Conservation Genomics
                            by seqadmin



                            The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                            Avian Conservation
                            Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                            03-08-2024, 10:41 AM

                          ad_right_rmr

                          Collapse

                          News

                          Collapse

                          Topics Statistics Last Post
                          Started by seqadmin, Yesterday, 06:37 PM
                          0 responses
                          8 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, Yesterday, 06:07 PM
                          0 responses
                          8 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 03-22-2024, 10:03 AM
                          0 responses
                          49 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 03-21-2024, 07:32 AM
                          0 responses
                          66 views
                          0 likes
                          Last Post seqadmin  
                          Working...
                          X