Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Why reads in unmapped.bam still align to reference genome?

    Hi all,
    I got less than 10% reads in unmapped.bam by tophat (Arabidopsis RNA-seq).
    Then I took some of them and did blast on NCBI. I expected to see those reads would align to some other species, however, all the reads I tried were still Arabidopsis mRNA. This confused me! Anyone has clue for it?
    THANKS!

  • #2
    Where they paired-end reads and you blasted just one end of the pair? Did the blast results not always have the reads mapping from end-to-end? There are a lot of possibilities for why this happens, the most common being that the reads weren't adapter trimmed.

    Comment


    • #3
      Perhaps the reads had too many mismatches in the seed region, or whatever cutoff parameters were set for the Tophat alignment.

      Comment


      • #4
        Originally posted by mastal View Post
        Perhaps the reads had too many mismatches in the seed region, or whatever cutoff parameters were set for the Tophat alignment.
        Hi mastal,
        I agree that some of them just contained too many mismatches, but is it possible that SNP exist in thses reads?
        here is the command I used:
        tophat -p 16 -G genes.gtf -o SP1_thout genome SP1_R1.fq SP1_R2.fq;

        Comment


        • #5
          Originally posted by dpryan View Post
          Where they paired-end reads and you blasted just one end of the pair? Did the blast results not always have the reads mapping from end-to-end? There are a lot of possibilities for why this happens, the most common being that the reads weren't adapter trimmed.
          Hi dpryan, thanks for your reply.
          the reads I used were all trimmed. You mean if only one paired-end read mapped, then all of this pair will be put into unmapped.bam?

          Comment


          • #6
            Originally posted by SpreeFu View Post
            Hi mastal,
            I agree that some of them just contained too many mismatches, but is it possible that SNP exist in thses reads?
            here is the command I used:
            tophat -p 16 -G genes.gtf -o SP1_thout genome SP1_R1.fq SP1_R2.fq;
            Sure, your sample could have SNPs that are different from the reference genome, and these would count as mismatches.

            But, for the unmapped reads that you checked with blast, what were the blast alignment stats like? For example: read length, alignment length, %identity, number of mismatches, number of gaps. Would you expect tophat/bowtie to align something with similar stats?

            Comment


            • #7
              Originally posted by mastal View Post
              Sure, your sample could have SNPs that are different from the reference genome, and these would count as mismatches.

              But, for the unmapped reads that you checked with blast, what were the blast alignment stats like? For example: read length, alignment length, %identity, number of mismatches, number of gaps. Would you expect tophat/bowtie to align something with similar stats?
              OK, I get what you mean, THANKS!

              Comment


              • #8
                I looked into some of my unmapped.bam files, and some of the reads align with on the whole length with 100% identity to the genome. It seems to me to be a bug in Tophat, but I would like to see it confirmed by someone else. I have single ended reads, so it is not because of some issues with the mate read aligning.

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Strategies for Sequencing Challenging Samples
                  by seqadmin


                  Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                  03-22-2024, 06:39 AM
                • seqadmin
                  Techniques and Challenges in Conservation Genomics
                  by seqadmin



                  The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                  Avian Conservation
                  Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                  03-08-2024, 10:41 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, Yesterday, 06:37 PM
                0 responses
                8 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, Yesterday, 06:07 PM
                0 responses
                8 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 03-22-2024, 10:03 AM
                0 responses
                49 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 03-21-2024, 07:32 AM
                0 responses
                66 views
                0 likes
                Last Post seqadmin  
                Working...
                X