Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • a question about tophat sam file

    I am trying tophat to map pairend reads to the contigs and had some tophat sam files. but I found in all these sam files, the column7(RNEXT) is "=" or "*". that means the next segment is on the same contigs or unmapped. is this normatl?
    my command is like this(I tried both tophat and tophat2):
    tophat --segment-length 15 --keep-tmp --min-coverage-intron 30 --library-type fr-firststrand -r 2850 --no-convert-bam -o mydocuments 454AllContigs fastq_end1 fastq_end2
    is there something I missed in my command? how can I find the next segment from the alignments?

    Thanks
    Xu
    Last edited by zxybl; 05-21-2012, 05:44 AM.

  • #2
    The column 7 RNEXT tells you, in case of paired end reads a "=" and in case of single end reads a "*" (or equivalently if a pair of your paired end read was removed pre-mapping by you or tophat as a filtering step).
    A read is unmapped if your flag (2nd column of sam file I believe) when performed a bitwise AND with 4 gives 4 back, meaning the unmapped read bit is set. And this alone is the way to be sure if a read is unmapped or not.

    Comment


    • #3
      my purpose is to use these cdna pairend reads to double check my genome scaffolds. so what I want to know from this alignment is which contigs the pairend reads mapped on if they mapped on different contigs. but from the sam file, it only shows the pairend reads mapped on same contigs("="). the others are all "*". it is not important to me that "*" means unmapped or no information. I am wondering whether tophat can provide the next segment information in column7(RNEXT).
      Last edited by zxybl; 05-23-2012, 05:57 AM.

      Comment


      • #4
        Hi zxybl,
        It would be great if you could paste 1/a few lines of your sam file that seems to have issues.

        Comment


        • #5
          HWI-EAS324_106750168:3:1:8674:5678 65 contig00001 5552 255 24M2I74M = 5615 163 TTCGGGGACCCAAATTTGAAAAAAAAATAGTGCTCTTCAAACTGGTTCCATAGGACCTGGGTGATTCAAGAAGCAAATCACCTGAAGAAGTAAGCAGGGT E<FCDDEGGDGGGHDEGHHFCHFFHHHBFGHHHHBHHHHHGHHHDHHHHHHHHHGHHHHHHHEHHHHHHHHHHHHHHHHEHHHHHHHHHHHHBHHHHHHH AS:i:-11 XN:i:0 XM:i:0 XO:i:1 XG:i:2 NM:i:2 MD:Z:98 YT:Z:UU NH:i:1 XS:A:-
          HWI-EAS324_106750168:3:5:9461:20574 65 contig00001 5552 255 24M2I74M = 5615 163 TTCGGGGACCCAAATTTGAAAAAAAAATAGTGCTCTTCAAACTGGTTCCATAGGACCTGGGTGATTCAAGAAGCAAATCACCTGAAGAAGTAAGCAGGGT >=E?EDEEEEEEHHBEECFFHHIFHGHBHGIHIFEHIIHIGIHIHIIIHIDIIIGHGIIIIHHIHIIIIIIIIEIIIIIIIIIIIIIIIIIIIIIIIIII AS:i:-11 XN:i:0 XM:i:0 XO:i:1 XG:i:2 NM:i:2 MD:Z:98 YT:Z:UU NH:i:1 XS:A:-
          HWI-EAS324_106750168:3:5:9472:20590 65 contig00001 5552 255 24M2I74M = 5615 163 TTCGGGGACCCAAATTTGAAAAAAAAATAGTGCTCTTCAAACTGGTTCCATAGGACCTGGGTGATTCAAGAAGCAAATCACCTGAAGAAGTAAGCAGGGT 8@AD@D>FCFAEAA80DD0DDDDDG<=:77@GD2GBGDGD?A;?>A2EABD@DD>EEBBEDGDBGGDGGD:?=@7<<DE<GG<GGGGDGEGGDBEGG@GG AS:i:-11 XN:i:0 XM:i:0 XO:i:1 XG:i:2 NM:i:2 MD:Z:98 YT:Z:UU NH:i:1 XS:A:-
          HWI-EAS324_106750168:3:9:5143:3861 65 contig00001 5552 255 25M2I73M = 5615 163 TTCGGGGACCCAAATTTGAAAAAAAAATAGTGCTCTTCAAACTGGTTCCATAGGACCTGGGTGATTCAAGAAGCAAATCACCTGAAGAAGTAAGCAGGGT D@GCDGEDGDEHGHDEGHFHEHECHGFBEFFEFD@FEIHFDIGDIIIEIIDGIIEIFIIIIIDGIIIIIIIIIIIIIIIGIIIIIIIIIIIIIIIIIIII AS:i:-11 XN:i:0 XM:i:0 XO:i:1 XG:i:2 NM:i:2 MD:Z:98 YT:Z:UU NH:i:1 XS:A:-
          HWI-EAS324_106750168:3:16:2480:4742 65 contig00001 5552 255 25M2I73M = 5615 163 TTCGGGGACCCAAATTTGAAAAAAAAATAGTGCTCTTCAAACTGGTTCCATAGGACCTGGGTGATTCAAGAAGCAAATCACCTGAAGAAGTAAGCAGGGT <F>CEEE@C=BDB>BFDHFCCDDD>GI<HHIHIDIGIIHHDGIIHGDDEE@GGGCIIGIIIIIGFIEIIHGGIGIIIHHHHFIIIGHIIIFGIIIIHHII AS:i:-11 XN:i:0 XM:i:0 XO:i:1 XG:i:2 NM:i:2 MD:Z:98 YT:Z:UU NH:i:1 XS:A:-
          HWI-EAS324_106750168:3:22:8741:20021 65 contig00001 5552 255 24M2I74M = 5615 163 TTCGGGGACCCAAATTTGAAAAAAAAATAGTGCTCTTCAAACTGGTTCCATAGGACCTGGGTGATTCAAGAAGCAAATCACCTGAAGAAGTAAGCAGGGT E>GGDDED@EEHGHEDECHHFHHHHHH>FDHFIGIHIIIHFIFIFDEIIIBIIEIIIIIIIIIIHHHIIIGGG<GIIGIEIIIIIIIIIIIIIIIHIIIH AS:i:-11 XN:i:0 XM:i:0 XO:i:1 XG:i:2 NM:i:2 MD:Z:98 YT:Z:UU NH:i:1 XS:A:-
          HWI-EAS324_106750168:3:5:18827:15083 73 contig00001 5587 255 100M * 0 0 CAAACTGGTTCCATAGGACCTGGGTGATTCAAGAAGCAAATCACCTGAAGAAGTAAGCAGGGTAGCGAAGATGTGAGTACTAGCCCTCCTATAAGGCTAT HHHHHGHHHGHHH@GHHHHHBGGGBGHHHHHHGDHHEHHHHHGGDHHHCEEHFCEHGHFHEE@GEBB>DB><C;AEC@EBBEBEBAA@CABC8B>,@;>; AS:i:-4 XN:i:0 XM:i:1 XO:i:0 XG:i:0 NM:i:1 MD:Z:98G1 YT:Z:UU NH:i:1 XS:A:-
          HWI-EAS324_106750168:3:17:2514:9275 73 contig00001 5598 255 100M * 0 0 CATAGGACCTGGGTGATTCAAGAAGCAAATCACCTGAAGAAGTAAGCAGGGTAGCGAAGATGTGAGTACTAGCCCTCCTATAAGGCTATATAGGGTGTCT IIIIIG>GGGGGGDGIIIIIIIIIIIIIIIIIIHIIIIIIHIIIIIHGIIIHIIIIFFIFIDHIHIEIFIIHBEHHHFIIIGGBEFIDI<IEEEEBEFFE AS:i:-5 XN:i:0 XM:i:1 XO:i:0 XG:i:0 NM:i:1 MD:Z:87G12 YT:Z:UU NH:i:1 XS:A:-

          Comment


          • #6
            Okay it seems to be as I expected (and mentioned above). You can go here and type in the flag (2nd column) to see what the value indicates.

            What is happening here? In all the reads you have shown above, the ones with "=", the flag seems to be 65. This means, "read is paired" and "read is first in pair". And all the ones where you seem to have a "*", it says "73" which is "read is paired", "read is first in pair" AND "MATE UNMAPPED" (excuse the caps lock). So the read for which you see a "*" having a flag "73" means that one of the pairs maps to this location shown and the other pair doesn't map anywhere. When the other pair doesn't map anywhere, how do you expect the location of the next read to be indicated in this read's appropriate column?

            Did I get you right this time and does this convince you?

            Comment


            • #7
              thanks. I got it

              I found something that is what I want. I think I did something wrong when I check the "=" and "*" number in column7.

              thank you very much.

              Xu

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Current Approaches to Protein Sequencing
                by seqadmin


                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                04-04-2024, 04:25 PM
              • seqadmin
                Strategies for Sequencing Challenging Samples
                by seqadmin


                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                03-22-2024, 06:39 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, 04-11-2024, 12:08 PM
              0 responses
              30 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 10:19 PM
              0 responses
              32 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 09:21 AM
              0 responses
              28 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-04-2024, 09:00 AM
              0 responses
              53 views
              0 likes
              Last Post seqadmin  
              Working...
              X