Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Ovation RNA-seq strand specificity

    Hi all,

    Wondering if anyone has a better understanding of how the NuGEN Ovation strand specific RNA-seq kits (specifically the Ovation® Human FFPE RNA-Seq Multiplex System) work. I'm having a bit of difficulty understanding the correct Tophat library-type and htseq-count parameters to use.

    For TruSeq kits, library-type=fr-firststrand and stranded=reverse are the correct parameters. See:
    Bridged amplification & clustering followed by sequencing by synthesis. (Genome Analyzer / HiSeq / MiSeq)


    But when I inspect the GAPDH locus with data from the Ovation kit, I see quite the opposite:
    Code:
    Flags	Count
    177	3
    89	3
    329	4
    161	8
    81	8
    65	10
    355	13
    403	13
    353	15
    401	15
    137	16
    163	39
    83	39
    153	875
    145	6030
    97	6035
    73	7442
    147	16934
    99	16934
    So it seems that read 1 of the pair is mapping to the sense (+ for GAPDH) strand, and read 2 of the pair is mapping to the antisense (- for GAPDH) strand?

    Furthermore, from htseq-count, using stranded=reverse:
    Code:
    >tail -n5 test.reverse.counts 
    __no_feature	458267
    __ambiguous	473
    __too_low_aQual	0
    __not_aligned	0
    __alignment_not_unique	58918
    using stranded=yes
    Code:
    >tail -n5 test.yes.counts 
    __no_feature	195770
    __ambiguous	10645
    __too_low_aQual	0
    __not_aligned	0
    __alignment_not_unique	58918
    Based on this, I would guess that library-type=fr-secondstrand and stranded=yes would be the correct options for the Ovation RNA-seq kits, even though on the surface they appear to use the same dUTP approach. NuGEN tech support confirmed that read 1 is indeed on the same strand as the original RNA, but they seemed fairly unsure.

    If anyone has used the strand specific Ovation kits and has some insight, it would be greatly appreciated!

  • #2
    Hi there,
    Maybe is not exactly the same, but I have used Encore Complete RNA-seq from NuGen. It makes also strand specific libraries, using a nucleotide analog. But I did sequence it only from one side (single-end read) not paired reads. I picked library-type=fr-firststrand from tophat, they recommend it for dUTP methods. Anyway, looking at tophat documentation, I don't get it very clear. In the FAQ section they suggest check the junction files:

    But the alignment cames from Bowtie, so if I wnat to check for sense (stranded=yes) or antisense (stranded=reverse) reads from my transcript then I ask htseq-count for it (I repeat, I only use single read).
    I found this post from biostars with some pics on it

    Comment


    • #3
      Hi Fanli, hi cascoamarillo,

      you may have a look at this biostars threat.

      Cheers,

      Michael

      Comment


      • #4
        @Michael.Ante:
        Yep, here's my RSeqC output:
        Code:
        This is PairEnd Data
        Fraction of reads failed to determine: 0.0087
        Fraction of reads explained by "1++,1--,2+-,2-+": 0.9716
        Fraction of reads explained by "1+-,1-+,2++,2--": 0.0197
        It is indeed the opposite of Illumina TruSeq data, but I don't understand why.

        @cascoamarillo:
        Using htseq-count, do your reads map sense or antisense to the transcript? I would be wary of automatically assuming that all dUTP systems are the same...

        Also, depending on your mapping parameters, reads that don't map in the specified orientation (e.g. library-type=fr-firststrand) may simply be mapped to the genome instead.

        Comment


        • #5
          the more I look into it, the less I understand. Specially the Tophat documentation.
          The way I see it is if I am producing strand specific libraries or not and what's the orientation of my first (or second) read. If is it the first strand (or the second strand) the one which is being sequenced makes me wonder and doubt due to the different protocols/kits in the market (that's my humble opinion).

          @Fanli,
          ..."I would be wary of automatically assuming that all dUTP systems are the same..." couldn't agree more.
          Yes I'm getting mostly sense transcripts
          This is SingleEnd Data
          Fraction of reads explained by "++,--": 0.9609
          Fraction of reads explained by "+-,-+": 0.0391
          Fraction of reads explained by other combinations: 0.0000

          As the Biostar threat above points, the XS flag seems only to be useful if you are using cufflinks afterwards. For htseq-count, I would say it is not (if anyone can reply me to this I'd appreciate).
          Reads that don't map to the specified orientation, they are still there. Of course they are mapped to the genome, then htseq-count takes the reference annotation and say whether they are stranded or reverse to the gene.
          One more thing, I made an alignment with both, library-type=fr-firststrand and secondstrand, from the same library (NuGen single read). No difference in the reads mapping output (except XS flags).
          Last edited by cascoamarillo; 02-23-2015, 01:30 PM.

          Comment


          • #6
            Read 1 (Forward read) is in the sense orientation for NuGEN stranded libraries.

            Comment


            • #7
              My understanding is that, in the truseq stranded protocol, the orientation of the read relative to the original transcript orientation depends only on the design of the Illumina Y-shaped adapters: if Illumina would simply swap the P5 and P7 positions in these Y-shaped adapters, the reads orientation relative to the transcript would be swapped as well (and settings for mapping would be inverted). Therefore the fact that both Ovation and Truseq methods use dUTP does not imply that the resulting reads need to be in the same orientation. I think this explains the apparent contradiction.

              Comment


              • #8
                The Universal RNA-Seq systems (including the Ovation Human FFPE RNA-Seq Multiplex System) from NuGEN achieve strand retention through degradation of the cDNA sense strand, along with degradation of directional adapters to select for a similar orientation of each cDNA relative to the original RNA molecule. This creates a single-stranded, directional library molecule that is further targeted for sequence-specific depletion of unwanted transcripts downstream. This unique approach differs slightly from the classic degradable nucleotide methods, and the strand does not need to be flipped for analysis. When using TopHat, you should set the --library-type parameter to fr-secondstrand.

                A mechanism that describes this process can be found in this video (the mechanism description starts at 3:00) –

                This video provides an overview of the InDA-C technology, used in NuGEN's RNA-Seq workflows to enable depletion of unwanted transcripts from RNA-Seq librarie...

                Comment


                • #9
                  lsherlin@NuGEN - interesting. I used HISAT2 on a fastq file generated from your RNAseq kit for model organisms (mouse) 1-16 kit, and got exactly the same alignment results for --rna-strandedness FR and RF HISAT2 options (which are meant to be equivalent to tophat 2 --library-type F and R respectively) - they both looked like this:

                  71267767 reads; of these:
                  71267767 (100.00%) were paired; of these:
                  8417298 (11.81%) aligned concordantly 0 times
                  48149274 (67.56%) aligned concordantly exactly 1 time
                  14701195 (20.63%) aligned concordantly >1 times
                  ----
                  8417298 pairs aligned concordantly 0 times; of these:
                  487002 (5.79%) aligned discordantly 1 time
                  ----
                  7930296 pairs aligned 0 times concordantly or discordantly; of these:
                  15860592 mates make up the pairs; of these:
                  11734821 (73.99%) aligned 0 times
                  2512705 (15.84%) aligned exactly 1 time
                  1613066 (10.17%) aligned >1 times
                  91.77% overall alignment rate


                  would you expect to see a difference between the alignment scores between these two settings (assuming that one of them is not appropriate for that kit)?

                  Cheers,
                  Matt

                  Comment


                  • #10
                    @mattarno - I am not familiar with HISAT2, although I am guessing, similar to TopHat, the strandedness you put should not have any impact on the alignment. Reads should still align regardless, as you are showing, however once fed into downstream counting/fpkm tools the tags set in the alignment may be used for counting. For example, the HISAT2 manual says, "With this option being used, every read alignment will have an XS attribute tag: '+' means a read belongs to a transcript on '+' strand of genome. '-' means a read belongs to a transcript on '-' strand of genome". I am not sure what you are using downstream for counting/fpkm, so it's hard to say if it will actually have any impact at all. If you are using cufflinks, you can actually get away with not setting this option, as cufflinks tries to guess which strand your data is on based on how reads are aligning to the annotation you provide. You can read this section of the cufflinks docs for some details: http://cole-trapnell-lab.github.io/c...#library-types.

                    tl;dr, that setting should not affect the actual alignment rates.

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      Essential Discoveries and Tools in Epitranscriptomics
                      by seqadmin




                      The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                      04-22-2024, 07:01 AM
                    • seqadmin
                      Current Approaches to Protein Sequencing
                      by seqadmin


                      Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                      04-04-2024, 04:25 PM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, Today, 08:47 AM
                    0 responses
                    10 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-11-2024, 12:08 PM
                    0 responses
                    60 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-10-2024, 10:19 PM
                    0 responses
                    57 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 04-10-2024, 09:21 AM
                    0 responses
                    53 views
                    0 likes
                    Last Post seqadmin  
                    Working...
                    X