Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Detecting insertions with BWA/GATK

    Dear all,

    I just recently started to evaluate MiSeq paired-end reads of 150bp in length for both research and diagnostic purposes. So far I used a pipeline consisting of BWA for the alignment, GATK for the SNP/INDEL calling and ANNOVAR for annotation of detected variants.

    A colleague of mine used a different analysis tool for diagnostics and reported a 46bp insertion in one of our target genes (BRCA1). She also confirmed it with gel-electrophoresis and clearly two bands are visible. However, I am not able to detect this insertion with my pipeline, neither with GATK nor am I able to visualize it with IGV. Thus, I’m thinking that my alignment might be incorrect.

    As I only used the default settings for BWA, I thought I can tweak some parameters. I tried playing around with the bwa aln –e and -o option but so far with no success. Additionally, I tried bowtie for the alignment but also couldn’t detect the insertion.

    What could be the best parameters for BWA to detect insertions of this length? Is there a limit for the length of recognizable insertions? Or should I even use a different alignment tool?

    I would be really thankful for any remarks and comments!

    Martin

  • #2
    I think that 46bp might be a bit large to detect with only a BWA/GATK pipeline. You can try increasing your -e parameter a lot in BWA, however.

    With 150bp PE reads, I would suggest running your BAM file through an SV caller that takes split-read information into account, like Pindel, DELLY, PRISM, or CREST.

    Comment


    • #3
      Are you using the plain BWA or BWA-SW which is designed for longer reads (>100bp)?

      Comment


      • #4
        I think it will be very challenging for any short-read pipeline to find that kind of indel. I think what you might expect is that your pipeline will tell you that there is a discrepancy between the sample and the reference around there, and you would have to do a bit of detective work to find out exactly what that is.

        Comment


        • #5
          Thank you all for the comments.

          @ aaronh
          I haven't tried the bwa-sw algorithm yet since it supposedly works best with reads longer than 200 bp. http://bio-bwa.sourceforge.net/ Nevertheless, I’ll try using the bwa-sw next.

          @ swbarnes2
          I also would expect some kind of variation around the insertion, at least at the edges. Yet nothing was reported by my approach.

          @ cwhelan
          I also tried to increase the –e parameter up to 50. However, this causes apparently an immense increase in memory usage up to the limit of my 24GB RAM and finally linux kills the process.

          I will look into the other tools you stated and hopefully solve this issue.

          I guess I’ll have to modify my analysis a lot to receive good and consistent results for this kind of read length, especially since we are about to use the new 500bp Kit which increases the read length to 250bp.

          Martin

          Comment


          • #6
            Hi Martin, you may have solved your problems by now but have you tried using the latest bwa version (0.7.3) which includes the bwa mem algorithm?

            For us this seems to be picking up large >10bp deletions better than bwa aln did

            BW

            Chris

            Comment


            • #7
              I would also recommend BWA-MEM for large indels. You can see the GCAT benchmarks for BWA-MEM and other aligners on large indel containing samples for 150bp paired end reads here http://www.bioplanet.com/gcat/report...-indel/bwa_mem

              Note the very high sensitivity and alignment accuracy

              Comment


              • #8
                Thank you very much. I was not aware of this other BWA algorithm. I will give it a try and let you know if I am able to pick up the larger INDELs with my pipeline.

                Martin

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Essential Discoveries and Tools in Epitranscriptomics
                  by seqadmin




                  The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                  04-22-2024, 07:01 AM
                • seqadmin
                  Current Approaches to Protein Sequencing
                  by seqadmin


                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                  04-04-2024, 04:25 PM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, Today, 08:47 AM
                0 responses
                12 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-11-2024, 12:08 PM
                0 responses
                60 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-10-2024, 10:19 PM
                0 responses
                59 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-10-2024, 09:21 AM
                0 responses
                54 views
                0 likes
                Last Post seqadmin  
                Working...
                X