Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • a question about tophat sam file

    I am trying tophat to map pairend reads to the contigs and had some tophat sam files. but I found in all these sam files, the column7(RNEXT) is "=" or "*". that means the next segment is on the same contigs or unmapped. is this normatl?
    my command is like this(I tried both tophat and tophat2):
    tophat --segment-length 15 --keep-tmp --min-coverage-intron 30 --library-type fr-firststrand -r 2850 --no-convert-bam -o mydocuments 454AllContigs fastq_end1 fastq_end2
    is there something I missed in my command? how can I find the next segment from the alignments?

    Thanks
    Xu
    Last edited by zxybl; 05-21-2012, 05:44 AM.

  • #2
    The column 7 RNEXT tells you, in case of paired end reads a "=" and in case of single end reads a "*" (or equivalently if a pair of your paired end read was removed pre-mapping by you or tophat as a filtering step).
    A read is unmapped if your flag (2nd column of sam file I believe) when performed a bitwise AND with 4 gives 4 back, meaning the unmapped read bit is set. And this alone is the way to be sure if a read is unmapped or not.

    Comment


    • #3
      my purpose is to use these cdna pairend reads to double check my genome scaffolds. so what I want to know from this alignment is which contigs the pairend reads mapped on if they mapped on different contigs. but from the sam file, it only shows the pairend reads mapped on same contigs("="). the others are all "*". it is not important to me that "*" means unmapped or no information. I am wondering whether tophat can provide the next segment information in column7(RNEXT).
      Last edited by zxybl; 05-23-2012, 05:57 AM.

      Comment


      • #4
        Hi zxybl,
        It would be great if you could paste 1/a few lines of your sam file that seems to have issues.

        Comment


        • #5
          HWI-EAS324_106750168:3:1:8674:5678 65 contig00001 5552 255 24M2I74M = 5615 163 TTCGGGGACCCAAATTTGAAAAAAAAATAGTGCTCTTCAAACTGGTTCCATAGGACCTGGGTGATTCAAGAAGCAAATCACCTGAAGAAGTAAGCAGGGT E<FCDDEGGDGGGHDEGHHFCHFFHHHBFGHHHHBHHHHHGHHHDHHHHHHHHHGHHHHHHHEHHHHHHHHHHHHHHHHEHHHHHHHHHHHHBHHHHHHH AS:i:-11 XN:i:0 XM:i:0 XO:i:1 XG:i:2 NM:i:2 MD:Z:98 YT:Z:UU NH:i:1 XS:A:-
          HWI-EAS324_106750168:3:5:9461:20574 65 contig00001 5552 255 24M2I74M = 5615 163 TTCGGGGACCCAAATTTGAAAAAAAAATAGTGCTCTTCAAACTGGTTCCATAGGACCTGGGTGATTCAAGAAGCAAATCACCTGAAGAAGTAAGCAGGGT >=E?EDEEEEEEHHBEECFFHHIFHGHBHGIHIFEHIIHIGIHIHIIIHIDIIIGHGIIIIHHIHIIIIIIIIEIIIIIIIIIIIIIIIIIIIIIIIIII AS:i:-11 XN:i:0 XM:i:0 XO:i:1 XG:i:2 NM:i:2 MD:Z:98 YT:Z:UU NH:i:1 XS:A:-
          HWI-EAS324_106750168:3:5:9472:20590 65 contig00001 5552 255 24M2I74M = 5615 163 TTCGGGGACCCAAATTTGAAAAAAAAATAGTGCTCTTCAAACTGGTTCCATAGGACCTGGGTGATTCAAGAAGCAAATCACCTGAAGAAGTAAGCAGGGT 8@AD@D>FCFAEAA80DD0DDDDDG<=:77@GD2GBGDGD?A;?>A2EABD@DD>EEBBEDGDBGGDGGD:?=@7<<DE<GG<GGGGDGEGGDBEGG@GG AS:i:-11 XN:i:0 XM:i:0 XO:i:1 XG:i:2 NM:i:2 MD:Z:98 YT:Z:UU NH:i:1 XS:A:-
          HWI-EAS324_106750168:3:9:5143:3861 65 contig00001 5552 255 25M2I73M = 5615 163 TTCGGGGACCCAAATTTGAAAAAAAAATAGTGCTCTTCAAACTGGTTCCATAGGACCTGGGTGATTCAAGAAGCAAATCACCTGAAGAAGTAAGCAGGGT D@GCDGEDGDEHGHDEGHFHEHECHGFBEFFEFD@FEIHFDIGDIIIEIIDGIIEIFIIIIIDGIIIIIIIIIIIIIIIGIIIIIIIIIIIIIIIIIIII AS:i:-11 XN:i:0 XM:i:0 XO:i:1 XG:i:2 NM:i:2 MD:Z:98 YT:Z:UU NH:i:1 XS:A:-
          HWI-EAS324_106750168:3:16:2480:4742 65 contig00001 5552 255 25M2I73M = 5615 163 TTCGGGGACCCAAATTTGAAAAAAAAATAGTGCTCTTCAAACTGGTTCCATAGGACCTGGGTGATTCAAGAAGCAAATCACCTGAAGAAGTAAGCAGGGT <F>CEEE@C=BDB>BFDHFCCDDD>GI<HHIHIDIGIIHHDGIIHGDDEE@GGGCIIGIIIIIGFIEIIHGGIGIIIHHHHFIIIGHIIIFGIIIIHHII AS:i:-11 XN:i:0 XM:i:0 XO:i:1 XG:i:2 NM:i:2 MD:Z:98 YT:Z:UU NH:i:1 XS:A:-
          HWI-EAS324_106750168:3:22:8741:20021 65 contig00001 5552 255 24M2I74M = 5615 163 TTCGGGGACCCAAATTTGAAAAAAAAATAGTGCTCTTCAAACTGGTTCCATAGGACCTGGGTGATTCAAGAAGCAAATCACCTGAAGAAGTAAGCAGGGT E>GGDDED@EEHGHEDECHHFHHHHHH>FDHFIGIHIIIHFIFIFDEIIIBIIEIIIIIIIIIIHHHIIIGGG<GIIGIEIIIIIIIIIIIIIIIHIIIH AS:i:-11 XN:i:0 XM:i:0 XO:i:1 XG:i:2 NM:i:2 MD:Z:98 YT:Z:UU NH:i:1 XS:A:-
          HWI-EAS324_106750168:3:5:18827:15083 73 contig00001 5587 255 100M * 0 0 CAAACTGGTTCCATAGGACCTGGGTGATTCAAGAAGCAAATCACCTGAAGAAGTAAGCAGGGTAGCGAAGATGTGAGTACTAGCCCTCCTATAAGGCTAT HHHHHGHHHGHHH@GHHHHHBGGGBGHHHHHHGDHHEHHHHHGGDHHHCEEHFCEHGHFHEE@GEBB>DB><C;AEC@EBBEBEBAA@CABC8B>,@;>; AS:i:-4 XN:i:0 XM:i:1 XO:i:0 XG:i:0 NM:i:1 MD:Z:98G1 YT:Z:UU NH:i:1 XS:A:-
          HWI-EAS324_106750168:3:17:2514:9275 73 contig00001 5598 255 100M * 0 0 CATAGGACCTGGGTGATTCAAGAAGCAAATCACCTGAAGAAGTAAGCAGGGTAGCGAAGATGTGAGTACTAGCCCTCCTATAAGGCTATATAGGGTGTCT IIIIIG>GGGGGGDGIIIIIIIIIIIIIIIIIIHIIIIIIHIIIIIHGIIIHIIIIFFIFIDHIHIEIFIIHBEHHHFIIIGGBEFIDI<IEEEEBEFFE AS:i:-5 XN:i:0 XM:i:1 XO:i:0 XG:i:0 NM:i:1 MD:Z:87G12 YT:Z:UU NH:i:1 XS:A:-

          Comment


          • #6
            Okay it seems to be as I expected (and mentioned above). You can go here and type in the flag (2nd column) to see what the value indicates.

            What is happening here? In all the reads you have shown above, the ones with "=", the flag seems to be 65. This means, "read is paired" and "read is first in pair". And all the ones where you seem to have a "*", it says "73" which is "read is paired", "read is first in pair" AND "MATE UNMAPPED" (excuse the caps lock). So the read for which you see a "*" having a flag "73" means that one of the pairs maps to this location shown and the other pair doesn't map anywhere. When the other pair doesn't map anywhere, how do you expect the location of the next read to be indicated in this read's appropriate column?

            Did I get you right this time and does this convince you?

            Comment


            • #7
              thanks. I got it

              I found something that is what I want. I think I did something wrong when I check the "=" and "*" number in column7.

              thank you very much.

              Xu

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Recent Advances in Sequencing Analysis Tools
                by seqadmin


                The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
                05-06-2024, 07:48 AM
              • seqadmin
                Essential Discoveries and Tools in Epitranscriptomics
                by seqadmin




                The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                04-22-2024, 07:01 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, Yesterday, 06:35 AM
              0 responses
              15 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 05-09-2024, 02:46 PM
              0 responses
              21 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 05-07-2024, 06:57 AM
              0 responses
              18 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 05-06-2024, 07:17 AM
              0 responses
              19 views
              0 likes
              Last Post seqadmin  
              Working...
              X