Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Why reads in unmapped.bam still align to reference genome?

    Hi all,
    I got less than 10% reads in unmapped.bam by tophat (Arabidopsis RNA-seq).
    Then I took some of them and did blast on NCBI. I expected to see those reads would align to some other species, however, all the reads I tried were still Arabidopsis mRNA. This confused me! Anyone has clue for it?
    THANKS!

  • #2
    Where they paired-end reads and you blasted just one end of the pair? Did the blast results not always have the reads mapping from end-to-end? There are a lot of possibilities for why this happens, the most common being that the reads weren't adapter trimmed.

    Comment


    • #3
      Perhaps the reads had too many mismatches in the seed region, or whatever cutoff parameters were set for the Tophat alignment.

      Comment


      • #4
        Originally posted by mastal View Post
        Perhaps the reads had too many mismatches in the seed region, or whatever cutoff parameters were set for the Tophat alignment.
        Hi mastal,
        I agree that some of them just contained too many mismatches, but is it possible that SNP exist in thses reads?
        here is the command I used:
        tophat -p 16 -G genes.gtf -o SP1_thout genome SP1_R1.fq SP1_R2.fq;

        Comment


        • #5
          Originally posted by dpryan View Post
          Where they paired-end reads and you blasted just one end of the pair? Did the blast results not always have the reads mapping from end-to-end? There are a lot of possibilities for why this happens, the most common being that the reads weren't adapter trimmed.
          Hi dpryan, thanks for your reply.
          the reads I used were all trimmed. You mean if only one paired-end read mapped, then all of this pair will be put into unmapped.bam?

          Comment


          • #6
            Originally posted by SpreeFu View Post
            Hi mastal,
            I agree that some of them just contained too many mismatches, but is it possible that SNP exist in thses reads?
            here is the command I used:
            tophat -p 16 -G genes.gtf -o SP1_thout genome SP1_R1.fq SP1_R2.fq;
            Sure, your sample could have SNPs that are different from the reference genome, and these would count as mismatches.

            But, for the unmapped reads that you checked with blast, what were the blast alignment stats like? For example: read length, alignment length, %identity, number of mismatches, number of gaps. Would you expect tophat/bowtie to align something with similar stats?

            Comment


            • #7
              Originally posted by mastal View Post
              Sure, your sample could have SNPs that are different from the reference genome, and these would count as mismatches.

              But, for the unmapped reads that you checked with blast, what were the blast alignment stats like? For example: read length, alignment length, %identity, number of mismatches, number of gaps. Would you expect tophat/bowtie to align something with similar stats?
              OK, I get what you mean, THANKS!

              Comment


              • #8
                I looked into some of my unmapped.bam files, and some of the reads align with on the whole length with 100% identity to the genome. It seems to me to be a bug in Tophat, but I would like to see it confirmed by someone else. I have single ended reads, so it is not because of some issues with the mate read aligning.

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Techniques and Challenges in Conservation Genomics
                  by seqadmin



                  The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                  Avian Conservation
                  Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                  03-08-2024, 10:41 AM
                • seqadmin
                  The Impact of AI in Genomic Medicine
                  by seqadmin



                  Artificial intelligence (AI) has evolved from a futuristic vision to a mainstream technology, highlighted by the introduction of tools like OpenAI's ChatGPT and Google's Gemini. In recent years, AI has become increasingly integrated into the field of genomics. This integration has enabled new scientific discoveries while simultaneously raising important ethical questions1. Interviews with two researchers at the center of this intersection provide insightful perspectives into...
                  02-26-2024, 02:07 PM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, 03-14-2024, 06:13 AM
                0 responses
                34 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 03-08-2024, 08:03 AM
                0 responses
                72 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 03-07-2024, 08:13 AM
                0 responses
                81 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 03-06-2024, 09:51 AM
                0 responses
                68 views
                0 likes
                Last Post seqadmin  
                Working...
                X