Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Thanks, Brian. This is where I am showing my ignorance I am sure, but how did the reads become so short? Looking at what I pulled out of the sam file, they are full-length (300bp) reads for the first few matches, but then become those little buggers are well.

    Comment


    • #17
      BWA-mem produces 'chimeric alignments'. This is actually a really neat feature in some cases, and a big pain in other cases - in my opinion, it should be disabled by default.

      If you look at the sam lines you posted, most of them have a bitflag (the second column) of over 2048. That indicates they are chimeric. BWA-mem appears to do multiple local alignments on reads, such that if there is a really good match for the first 20% somewhere, that will be presented as a single line in the sam file, and if there is a really good match for the middle 40%, that will be displayed as a different line, etc. So a single read could generate a huge number of lines in the sam file. The goal is to correctly map reads that are chimeric (such as reads from a cancer sample with two chromosomes randomly fused together). But apparently, it does not work well in extreme-GC genomes; most mappers are designed for human and mouse genomes, which have approximately 50% GC, as they constitute the majority of genetic research. But since I work at a place that strictly deals with microbial, plant, and fungal genomes, BBMap (which was originally designed for human) is now developed for and tested on a much wider array of organisms than most.

      BWA's chimeric alignments are local and hard-clipped. For example, this cigar string from the second line you posted - "221H79M" - means that the first 221 bases were ignored and only the last 79 bases are included in the alignment. Of course, this will wreak havoc with something like fastqc, where all reads are weighted equally regardless of length. Rather than a length filter (which will unnecessarily exclude reads that had been adapter- or quality-trimmed), I think you should simply use samtools to filter out reads with the chimeric flag marked.
      Last edited by Brian Bushnell; 08-02-2014, 12:00 PM.

      Comment


      • #18
        Originally posted by Genomics101 View Post
        Thanks very much, GenoMax. Indeed, it is MiSeq data, but I never had this problem with MiSeq before (that was with 250bp PE reads, these are 300s). Can you tell me more the particular pathology with MiSeq? Is this a problem with library construction? And, goodness, what is an adapter lawn?
        Nucacidhunter has a nice description in this thread: http://seqanswers.com/forums/showthread.php?t=43071

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Current Approaches to Protein Sequencing
          by seqadmin


          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
          04-04-2024, 04:25 PM
        • seqadmin
          Strategies for Sequencing Challenging Samples
          by seqadmin


          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
          03-22-2024, 06:39 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 04-11-2024, 12:08 PM
        0 responses
        24 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 10:19 PM
        0 responses
        25 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 09:21 AM
        0 responses
        21 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-04-2024, 09:00 AM
        0 responses
        52 views
        0 likes
        Last Post seqadmin  
        Working...
        X