Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Tophat output contains "unmapped"??

    Hello, there,

    I recently examined the output of tophat (tophat22.0.8) after converting them into sam file, and found that the following results:

    grep "HWI-ST514:143982632:C37PRACXX:5:1101:12852:4786" accepted_hits.ns.sam
    HWI-ST514:143982632:C37PRACXX:5:1101:12852:4786 89 A_ref-1.0_Cont33 766585 50 92M * 0 0 CTTGTATTGAGTACGATCTCTCCACCTCTCCGGTTCGCAATACAGCTTTGAGAAAGAACTTATTACCCTCTCTACTATATAATTAAATTGTA DDDDEDDDDDDDDDDDBDCDDDAACDDFHHJJJJJJJJIIHHJJJJJJJJJJJJJJJIJJIHFC:JJHGIIJJIJJJJJJJJJJJJJHHHFF MD:Z:92 XG:i:0 NH:i:1 NM:i:0 XM:i:0 XN:i:0 XO:i:0 AS:i:0 YT:Z:UU
    HWI-ST514:143982632:C37PRACXX:5:1101:12852:4786 137 A_ref-1.0_Cont33 766582 50 100M * 0 0 TTCCTTGTATTGAGTACGATCTCTCCACCTCTCCGGTTCGCAATACAGCTTTGAGAAAGAACTTATTACCCTCTCTACTATATAATTAAATTGTACTTTG CCCFFFFFHHHHHJGIIJJJJJJJJJJJJIJIIJJJFGIIJJJJJJJJIJHIJJJJJJJJHHHHHFFFFFFFDEEEEDEDEDEFEEEECCEDCCDEEFED MD:Z:100 XG:i:0 NH:i:1 NM:i:0 XM:i:0 XN:i:0 XO:i:0 AS:i:0 YT:Z:UU

    The reference is the genome scaffolds here. My question is about the samflag: 89 represents read paired,mate unmapped,read reverse strand,first in pair; 137 represents read paired, mate unmapped, second in pair.

    These reads appear to be paired, and both mapped. Then why did samflag say their mates were not mapped? Is it because they did not map to the same scaffold?

    Could anyone explain this to me?

    Thanks

  • #2
    It is Tophat, not samtools, that sets the flags. So look at Tophat for answers to your question. I presume you mean version 2.0.8 and not 22.0.8; if so a newer version of Tophat may give better results.

    I haven't used Tophat in about a year so treate the following with caution. For your specific question you say ".. Is it because they did not map to the same scaffold?..." but to my eye it looks like they did map to the same scaffold.

    1st read to A_ref-1.0_Cont33 at base 766585
    2nd read to A_ref-1.0_Cont33 at base 766582

    Same scaffold with a 3-bp overlap. My guess is that this is why Tophat did not consider the two to have the mate mapped.

    Comment


    • #3
      I meant that they did not map to the same strand of the scaffold...

      Comment


      • #4
        This appears to be a bug that was fixed in version 2.0.9, which was released over a year ago.

        Comment


        • #5
          @dpryan, could you please explain a bit more? What is the correct information supposed to look like?

          Comment


          • #6
            What do you mean "what is the correct information supposed to look like"? You're using an old version with a known bug (the one you're asking about). Just upgrade.

            Comment


            • #7
              well, upgrade and rerun, might take several days.

              So I wonder if the changes for these two reads in new tophat2.0.9 will be in samflag fields?

              Thanks

              Comment


              • #8
                Yeah, it should just fix the flags. In this example, the flags should become 83 and 163.

                BTW, if you switch to STAR you'll get alignments vastly faster.

                Comment


                • #9
                  thank you very much for explanations!

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Essential Discoveries and Tools in Epitranscriptomics
                    by seqadmin




                    The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                    04-22-2024, 07:01 AM
                  • seqadmin
                    Current Approaches to Protein Sequencing
                    by seqadmin


                    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                    04-04-2024, 04:25 PM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, Yesterday, 11:49 AM
                  0 responses
                  15 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-24-2024, 08:47 AM
                  0 responses
                  16 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-11-2024, 12:08 PM
                  0 responses
                  61 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 10:19 PM
                  0 responses
                  60 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X