Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Hello from Sunny

    Hi everyone

    I am new to the forum. I have just started to analyze the RNAseq data. I am a little confused on the "accepted_hits.sam" from Tophat. How to interpret this file? What kinds of informaiton does this file contain? Thanks in advance.

  • #2
    Welcome! I assume you've read the TopHat manual?

    Comment


    • #3
      Yes, I had read the manual previously. Today I spent all day to figure out the meaning for each column in "accepted_hits.sam" file. First I want to thank the members in this community. I read lots of posts and they are so useful and helpful. I still have two questions.

      1) based on the results in ""accepted_hits.sam", how to decide which is right-read and left-read.
      HWI-EAS266_0005:1:1:1295:1179#0 113 chr1 554327 255 43M = 40043719 0 GGGAGTCCGAACTAGTCTCAGGCTTCAACATCGAATACGCCGC BBBBBBBBBBBBBBBBBB```````````````KKLHFEEFHO NM:i:0
      HWI-EAS266_0005:1:1:1561:15347#0 163 chr1 554418 255 43M = 554511 0 AATAAACACCCTCACCACTACAATCTTCCTAGGAACAACATAA bbbbbbbbbbbbabbbbbbbbbbbabbbbabbabbbbbbbb`_ NM:i:1
      HWI-EAS266_0005:1:1:1330:2415#0 163 chr1 554438 255 43M = 554556 0 CAATCTTCCTAGGAACAACATATGACGCACTCTCCCCTGAACC bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb\ba^ NM:i:2
      2) We conducted the pair-end sequencing, I provided both end sequences in the command (/tophat -r 70 -o <..> hg18 s1_1_sequence.txt s_1_2_sequence.txt). Why is the ISIZE 0 not the second position subtract the first position?
      Thanks,

      Comment


      • #4
        Originally posted by sunnyvu View Post
        Yes, I had read the manual previously. Today I spent all day to figure out the meaning for each column in "accepted_hits.sam" file. First I want to thank the members in this community. I read lots of posts and they are so useful and helpful. I still have two questions.

        1) based on the results in ""accepted_hits.sam", how to decide which is right-read and left-read.
        HWI-EAS266_0005:1:1:1295:1179#0 113 chr1 554327 255 43M = 40043719 0 GGGAGTCCGAACTAGTCTCAGGCTTCAACATCGAATACGCCGC BBBBBBBBBBBBBBBBBB```````````````KKLHFEEFHO NM:i:0
        HWI-EAS266_0005:1:1:1561:15347#0 163 chr1 554418 255 43M = 554511 0 AATAAACACCCTCACCACTACAATCTTCCTAGGAACAACATAA bbbbbbbbbbbbabbbbbbbbbbbabbbbabbabbbbbbbb`_ NM:i:1
        HWI-EAS266_0005:1:1:1330:2415#0 163 chr1 554438 255 43M = 554556 0 CAATCTTCCTAGGAACAACATATGACGCACTCTCCCCTGAACC bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb\ba^ NM:i:2
        2) We conducted the pair-end sequencing, I provided both end sequences in the command (/tophat -r 70 -o <..> hg18 s1_1_sequence.txt s_1_2_sequence.txt). Why is the ISIZE 0 not the second position subtract the first position?
        Thanks,
        1) See the SAM Manual, specifically the FLAG field (2nd column). You can use "samtools view -X <sam/bam>" to print the FLAG field into a string.

        2) See the SAM Manual, specifically the definition of ISIZE (outer coordinates).

        Comment


        • #5
          Thanks.

          I tried this "./samtools view -S -t -X -o exampl.flag accepted_hits.sam"
          I got the error:
          The tag [ID] required for [PG] not present.
          Segmentation fault.

          The title of "accepted_hits.sam":
          @HD VN:1.0 SO:sorted
          @PG TopHat VN:1.0.13
          Did I do something wrong when I ran TopHat?

          Comment


          • #6
            Originally posted by sunnyvu View Post
            Thanks.

            I tried this "./samtools view -S -t -X -o exampl.flag accepted_hits.sam"
            I got the error:
            The tag [ID] required for [PG] not present.
            Segmentation fault.

            The title of "accepted_hits.sam":
            @HD VN:1.0 SO:sorted
            @PG TopHat VN:1.0.13
            Did I do something wrong when I ran TopHat?
            You should report this to the TopHat user list. It looks like the "@PG" line should be:
            Code:
            @PG     ID:TopHat  VN:1.0.13

            Comment


            • #7
              You are right.
              Thanks.

              Comment


              • #8
                Originally posted by nilshomer View Post
                1) See the SAM Manual, specifically the FLAG field (2nd column). You can use "samtools view -X <sam/bam>" to print the FLAG field into a string.

                2) See the SAM Manual, specifically the definition of ISIZE (outer coordinates).
                About question 2)
                It's still strange because based on ISIZE definition, if the two reads in a pair are mapped to the same reference, ISIZE should be calculated, but here it's 0 for all the cases.

                Does anyone have idea about that?

                Comment


                • #9
                  Originally posted by nilshomer View Post
                  You should report this to the TopHat user list. It looks like the "@PG" line should be:
                  Code:
                  @PG     ID:TopHat  VN:1.0.13
                  This has been fixed in TopHat v1.0.14.

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Choosing Between NGS and qPCR
                    by seqadmin



                    Next-generation sequencing (NGS) and quantitative polymerase chain reaction (qPCR) are essential techniques for investigating the genome, transcriptome, and epigenome. In many cases, choosing the appropriate technique is straightforward, but in others, it can be more challenging to determine the most effective option. A simple distinction is that smaller, more focused projects are typically better suited for qPCR, while larger, more complex datasets benefit from NGS. However,...
                    10-18-2024, 07:11 AM
                  • seqadmin
                    Non-Coding RNA Research and Technologies
                    by seqadmin




                    Non-coding RNAs (ncRNAs) do not code for proteins but play important roles in numerous cellular processes including gene silencing, developmental pathways, and more. There are numerous types including microRNA (miRNA), long ncRNA (lncRNA), circular RNA (circRNA), and more. In this article, we discuss innovative ncRNA research and explore recent technological advancements that improve the study of ncRNAs.

                    Nobel Prize for MicroRNA Discovery
                    This week,...
                    10-07-2024, 08:07 AM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, Yesterday, 05:31 AM
                  0 responses
                  10 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 10-24-2024, 06:58 AM
                  0 responses
                  20 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 10-23-2024, 08:43 AM
                  0 responses
                  51 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 10-17-2024, 07:29 AM
                  0 responses
                  58 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X