Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Some "wrong" XS:A in Tophat output for strand specific pair-end RNA-Seq data

    Hi, all,

    When dealing with strand-specific pair-end RNA-Seq data, sequenced from 5' end and mapped with --library-type fr-secondstrand, I got some strang flags and XS:A labeling listed as follows.

    H2208:7439:9018 pPr1 chr1 4887061 50 100M = 4887050 -111 XS:A:+ NH:i:1
    H2208:7439:9018 pPR2 chr1 4887050 50 100M = 4887061 111 XS:A:+ NH:i:1

    H16340:40651 pPR2 chr1 6284001 50 100M = 6284208 307 XS:A:- NH:i:1
    H16340:40651 pPr1 chr1 6284208 50 100M = 6284001 -307 XS:A:- NH:i:1

    According to my library type, I can guess all the reads listed here are from negative strand.

    Why in the first pair, these reads are assumed to be from positive strand?

    There are about 1 pair of reads like the first one in every 400 paired reads with flag pPr1 or 83.

    I wonder if Tophat has other considerations or my assumption is wrong or this is a bug of Tophat (TopHat v2.0.4)?

    Thank you!

    Tong Chen

  • #2
    You are confusing strand orientation of the read relative to the transcript, which is what the --library-type parameter controls for, and orientation of the transcript relative to the genome, which is what the XS:A flag reports. In your example the XS:A:+ of the first read pair is reporting that the transcript which gave rise to this fragment is transcribed from the forward(+) strand of the genome. Conversely the transcript associated with the second pair is transcribed from the reverse(-) strand. This is kind of the the main reason for using strand-specific RNA-Seq protocols, to determine without ambiguity* which strand a transcript lies on.

    (*Ignoring that you can never entirely eliminate ambiguity because nothing is 100%.)

    Comment


    • #3
      Thanks kmcarr. I understand your point.

      However, I think the first pair of reads should also be transcribed from the nagative strand for the following three reasons.

      1. "H2208:7439:9018 pPr1" means the first one of paired-reads mapped to the reverse strand.

      2. The library-type is "fr-secondstrand" can tell you the first one of paired-reads always map to the strand that generates it.

      3. Here, the XS:A for "H2208:7439:9018 pPr1" should be '-'.

      Is this right?

      Comment


      • #4
        Originally posted by ct586 View Post
        Thanks kmcarr. I understand your point.

        However, I think the first pair of reads should also be transcribed from the nagative strand for the following three reasons.

        1. "H2208:7439:9018 pPr1" means the first one of paired-reads mapped to the reverse strand.

        2. The library-type is "fr-secondstrand" can tell you the first one of paired-reads always map to the strand that generates it.
        What method was used to prepare your strand specific library? If it was the newer TruSeq stranded kits, or any kit which uses dUTP marking of the second cDNA strand, you should use "--library-type fr-firststrand" for the TopHat alignment, not "fr-secondstrand"

        3. Here, the XS:A for "H2208:7439:9018 pPr1" should be '-'.

        Is this right?
        First make sure you know what method was used for the strand-specific library preparation and you use the correct "--library-type" parameter for TopHat (READ the TopHat manual). If the alignment has not been done properly any discussion of strandedness is going to be completely confused and ultimately fruitless.

        Comment


        • #5
          I am quite sure that my "--library-type" is "fr-secondstrand". That is why I did the folowing inferring.

          I think others have realized this type of question too, as showed in this post. [http://onetipperday.blogspot.com/201...f-tophat.html]

          Besides, some already published data also contain this type of inconsistent tags. The test data used by Rseqc. The link of the test data http://dldcc-web.brc.bcm.edu/lilab/l...Human_hg19.bam. [Attention: a large BAM file]

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Essential Discoveries and Tools in Epitranscriptomics
            by seqadmin




            The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
            04-22-2024, 07:01 AM
          • seqadmin
            Current Approaches to Protein Sequencing
            by seqadmin


            Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
            04-04-2024, 04:25 PM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, Today, 08:47 AM
          0 responses
          11 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-11-2024, 12:08 PM
          0 responses
          60 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 10:19 PM
          0 responses
          59 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 09:21 AM
          0 responses
          54 views
          0 likes
          Last Post seqadmin  
          Working...
          X