Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Best aligner for detecting deletions

    I am looking for a specific ~20bp deletion in Illumina 2x150 data. Coverage is over 10,000 X. I have at least one sample where it is present. I was previously able to capture 30bp deletions with BWA, so 20bp should not be too big, but BWA is not catching it this time for whatever reason. I then tried Bowtie 2, which was able to detect it. I am following both with realignment using GATK.

    Since the only two aligners I tried (BWA and Bowtie) gave very different results, now I am thinking others may give even more differing results. I could just start testing them and see what happens, but I thought I should consult here first. Is there an optimal aligner for deletions of that size?

  • #2
    If you are looking for a specific deletion, put both both alleles in your reference fasta. All aligners will be able to assign the read to the correct sequence if it is explicitly in the reference file.

    Comment


    • #3
      Originally posted by swbarnes2 View Post
      If you are looking for a specific deletion, put both both alleles in your reference fasta. All aligners will be able to assign the read to the correct sequence if it is explicitly in the reference file.
      That's a great idea. It would definitely work this one time, but I would also like to see if I can do something going forward for all deletion. I am wondering how many other deletions I am missing that I am not aware of.

      Comment


      • #4
        Dear @id0,

        You may try the Subread aligner. The publicly available version (http://subread.sourceforge.net) can detect up to 16bp indels. But our in-house 1.4 version can detect >200bp indels.

        Please have a look at this post on how to use the 1.4 version to detect long indels:

        Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc


        Let me know if you ran into any problems.

        Best Wishes,
        Wei

        Comment


        • #5
          @id0: please also try novoalign and bwa-mem. They can usually identify longer indels from single-end reads. With bwa-backtrack (the original algorithm), you have to use paired-end data. Bwa-backtrack cannot find gaps longer than a few bp given single-end reads. This is its algorithmic limitation. See also my reply in the thread shi provided.

          Comment


          • #6
            @id0

            I also suggest you play around with GCAT on bioplanet.com, as they have variant calling benchmarks for various softwares. What you might find interesting in particular is a plot showing indel distributions from various pipelines. For example the following shows indel distribution on a 30x coverage exome from Illumina for a few different Bowtie2, BWA, and GATK pipelines:

            View Full GCAT Variant Calls Report: Bowtie+Gatk_HC (illumina-100bp-pe-exome-30x)


            In this case pipelines with GATK's HaplotypeCaller found large deletions. The graph is easier to view and interact with on the real website

            Comment


            • #7
              GCAT looks good.
              Last edited by abi; 05-21-2013, 09:34 AM.

              Comment


              • #8
                @abi

                There is actually a field describing the exact command line used (although it is dependent on the uploader to fill in ). For example you should see this in the top right of the report:



                I also see you are at VTech! Me too
                Last edited by oiiio; 05-21-2013, 09:28 AM.

                Comment


                • #9
                  Yes, I am in CS @ VT. How about you?

                  Comment


                  • #10
                    I think Stampy should also be mentioned here:

                    High-volume sequencing of DNA and RNA is now within reach of any research laboratory and is quickly becoming established as a key research tool. In many workflows, each of the short sequences ("reads") resulting from a sequencing run are first "mapped" (aligned) to a reference sequence to infer the …

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      Essential Discoveries and Tools in Epitranscriptomics
                      by seqadmin




                      The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                      04-22-2024, 07:01 AM
                    • seqadmin
                      Current Approaches to Protein Sequencing
                      by seqadmin


                      Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                      04-04-2024, 04:25 PM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, Yesterday, 11:49 AM
                    0 responses
                    13 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-24-2024, 08:47 AM
                    0 responses
                    16 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-11-2024, 12:08 PM
                    0 responses
                    61 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-10-2024, 10:19 PM
                    0 responses
                    60 views
                    0 likes
                    Last Post seqadmin  
                    Working...
                    X