Seqanswers Leaderboard Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • ct586
    Junior Member
    • Mar 2012
    • 7

    Some "wrong" XS:A in Tophat output for strand specific pair-end RNA-Seq data

    Hi, all,

    When dealing with strand-specific pair-end RNA-Seq data, sequenced from 5' end and mapped with --library-type fr-secondstrand, I got some strang flags and XS:A labeling listed as follows.

    H2208:7439:9018 pPr1 chr1 4887061 50 100M = 4887050 -111 XS:A:+ NH:i:1
    H2208:7439:9018 pPR2 chr1 4887050 50 100M = 4887061 111 XS:A:+ NH:i:1

    H16340:40651 pPR2 chr1 6284001 50 100M = 6284208 307 XS:A:- NH:i:1
    H16340:40651 pPr1 chr1 6284208 50 100M = 6284001 -307 XS:A:- NH:i:1

    According to my library type, I can guess all the reads listed here are from negative strand.

    Why in the first pair, these reads are assumed to be from positive strand?

    There are about 1 pair of reads like the first one in every 400 paired reads with flag pPr1 or 83.

    I wonder if Tophat has other considerations or my assumption is wrong or this is a bug of Tophat (TopHat v2.0.4)?

    Thank you!

    Tong Chen
  • kmcarr
    Senior Member
    • May 2008
    • 1181

    #2
    You are confusing strand orientation of the read relative to the transcript, which is what the --library-type parameter controls for, and orientation of the transcript relative to the genome, which is what the XS:A flag reports. In your example the XS:A:+ of the first read pair is reporting that the transcript which gave rise to this fragment is transcribed from the forward(+) strand of the genome. Conversely the transcript associated with the second pair is transcribed from the reverse(-) strand. This is kind of the the main reason for using strand-specific RNA-Seq protocols, to determine without ambiguity* which strand a transcript lies on.

    (*Ignoring that you can never entirely eliminate ambiguity because nothing is 100%.)

    Comment

    • ct586
      Junior Member
      • Mar 2012
      • 7

      #3
      Thanks kmcarr. I understand your point.

      However, I think the first pair of reads should also be transcribed from the nagative strand for the following three reasons.

      1. "H2208:7439:9018 pPr1" means the first one of paired-reads mapped to the reverse strand.

      2. The library-type is "fr-secondstrand" can tell you the first one of paired-reads always map to the strand that generates it.

      3. Here, the XS:A for "H2208:7439:9018 pPr1" should be '-'.

      Is this right?

      Comment

      • kmcarr
        Senior Member
        • May 2008
        • 1181

        #4
        Originally posted by ct586 View Post
        Thanks kmcarr. I understand your point.

        However, I think the first pair of reads should also be transcribed from the nagative strand for the following three reasons.

        1. "H2208:7439:9018 pPr1" means the first one of paired-reads mapped to the reverse strand.

        2. The library-type is "fr-secondstrand" can tell you the first one of paired-reads always map to the strand that generates it.
        What method was used to prepare your strand specific library? If it was the newer TruSeq stranded kits, or any kit which uses dUTP marking of the second cDNA strand, you should use "--library-type fr-firststrand" for the TopHat alignment, not "fr-secondstrand"

        3. Here, the XS:A for "H2208:7439:9018 pPr1" should be '-'.

        Is this right?
        First make sure you know what method was used for the strand-specific library preparation and you use the correct "--library-type" parameter for TopHat (READ the TopHat manual). If the alignment has not been done properly any discussion of strandedness is going to be completely confused and ultimately fruitless.

        Comment

        • ct586
          Junior Member
          • Mar 2012
          • 7

          #5
          I am quite sure that my "--library-type" is "fr-secondstrand". That is why I did the folowing inferring.

          I think others have realized this type of question too, as showed in this post. [http://onetipperday.blogspot.com/201...f-tophat.html]

          Besides, some already published data also contain this type of inconsistent tags. The test data used by Rseqc. The link of the test data http://dldcc-web.brc.bcm.edu/lilab/l...Human_hg19.bam. [Attention: a large BAM file]

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Pathogen Surveillance with Advanced Genomic Tools
            by seqadmin




            The COVID-19 pandemic highlighted the need for proactive pathogen surveillance systems. As ongoing threats like avian influenza and newly emerging infections continue to pose risks, researchers are working to improve how quickly and accurately pathogens can be identified and tracked. In a recent SEQanswers webinar, two experts discussed how next-generation sequencing (NGS) and machine learning are shaping efforts to monitor viral variation and trace the origins of infectious...
            03-24-2025, 11:48 AM
          • seqadmin
            New Genomics Tools and Methods Shared at AGBT 2025
            by seqadmin


            This year’s Advances in Genome Biology and Technology (AGBT) General Meeting commemorated the 25th anniversary of the event at its original venue on Marco Island, Florida. While this year’s event didn’t include high-profile musical performances, the industry announcements and cutting-edge research still drew the attention of leading scientists.

            The Headliner
            The biggest announcement was Roche stepping back into the sequencing platform market. In the years since...
            03-03-2025, 01:39 PM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, 03-20-2025, 05:03 AM
          0 responses
          49 views
          0 reactions
          Last Post seqadmin  
          Started by seqadmin, 03-19-2025, 07:27 AM
          0 responses
          57 views
          0 reactions
          Last Post seqadmin  
          Started by seqadmin, 03-18-2025, 12:50 PM
          0 responses
          50 views
          0 reactions
          Last Post seqadmin  
          Started by seqadmin, 03-03-2025, 01:15 PM
          0 responses
          201 views
          0 reactions
          Last Post seqadmin  
          Working...