Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • strand-specific libraries / firststrand /secondstrand

    Hey,

    I`m still very uncertain when dealing with strand specific RNA-Seq data. Especially when using TopHat2 and Cufflinks, as these make use of the strand-information via the library-types.

    I found this table for the TopHat2 / Cufflinks library type options: http://www.nature.com/nprot/journal/...12.016_T1.html

    In my data I can clearly see that the R1 (forward) read maps on the sense/coding strand and the R2 (reverse) read maps on the antisense strand.

    Illustration:

    a) gene located on wat (+) strand

    ......................R1
    .....................----->
    --------------[############# Gene ##############]-------------------- wat (+)
    --------------------------------------------------------------------------------------------- cri (-)
    ..........................................................<------
    ............................................................R2


    b) gene located on cri (-) strand

    .......................R2
    ......................----->
    --------------------------------------------------------------------------------------------- wat (+)
    --------------[############# Gene ##############]-------------------- cri (-)
    ..........................................................<-----
    ............................................................R1


    This would mean (according to my link) that I have fr-secondstrand. As
    the leftmost end of the fragment (in transcript coordinates) is the first sequenced
    Am I correct with this assumption?

    What I still do not get are the terms "firststrand" and "secondstrand" themselves.

    My understanding of the library prep is the following (leaving out fragmentation):

    1) Transcription

    5' [###########Gene############] 3' coding strand
    3' -------------------------------------------------- 5' template strand

    5' -------------------------------------------------- 3' mRNA


    2) Adapter Ligation (Lets assume 5'Adapter seq is only AATT and 3'Adapter seq only GGCC)

    5' AATT------------------------------------------------GGCC 3' mRNA+Adapters


    3) 1st strand synthesis

    5' AATT------------------------------------------------GGCC 3' mRNA+Adapters
    3' TTAA------------------------------------------------CCGG 5' 1st cDNA


    4) 2nd strand synthesis

    5' AATT------------------------------------------------GGCC 3' 2nd cDNA <---- identical (U->T) to mRNA
    3' TTAA------------------------------------------------CCGG 5' 1st cDNA


    Let`s skip the PCR


    5a) Sequencing 1st cDNA strand

    5'........SeqPrimer----->
    3' TTAA------------------------------------------------CCGG 5' 1st cDNA

    As I see it, I now get a read, which is identical to a part of the mRNA sequence located at the left end.


    5b) Sequencing 2nd cDNA strand

    5' AATT------------------------------------------------GGCC 3' 2nd cDNA
    .....................................<-----remirPqeS..........5'

    Now I should get a read whose reverse complement is identical to a part of the mRNA sequene located at the right end.




    With this understanding of the library prep I would say that if my R1 (forward) read is located on the sense/coding strand I would have sequenced the 1st strand first, but according to my link it must have been "secondstrand".

    I hope anyone is able to understand me and detects my misinterpretation of the first/secondstrand terms or my misinterpretation of the library prep.

    Thanks in advance

    Mario
    Last edited by Mchicken; 07-15-2015, 06:06 AM.

  • #2
    Better to determine this setting empirically:
    Run TopHat+Cufflinks pipeline separately with either firststrand or secondstrand options.
    Then assuming your annotation file matches your library somewhat,
    the version with much larger alignment and FPKM numbers will be the correct option for your library prep method.

    Comment


    • #3
      First of all thanks for your advice. I already read this way of library determination somewhere.
      Nevertheless there should be a logical explanation anyway. The company, which sequenced our samples told us yesterday, that indeed the R1 read corresponds to the sense/coding strand, like I observed when I mapped my paired-end data with TopHat2 (using library-type unstranded).

      Comment


      • #4
        Honestly... when I first wrote the code to handle firststrand/secondstrand, it took me a week of going back and forth and talking to different people who make libraries because the description in the Tuxedo package is so incredibly confusing. They should be named clearly, as in:

        READ1-PLUS protocol and READ1-MINUS protocol, or READ1-SENSE, or something like that.

        Every time I am asked questions about this I have to go back to the comments in my source code because the names are so vague and the official descriptions so opaque as to be meaningless.

        Comment


        • #5
          Brian's right; the terminology is confusing.

          Regarding your original questions, the orientation of the gene on the DNA (Watson or Crick strand) is irrelevant. The quoted statement ["the leftmost end of the fragment (in transcript coordinates) is the first sequenced"] indicates that read1 proceeds in the 5'->3' orientation of the mRNA.

          As for your second question, strandedness (for TopHat) refers to the sequence being generated. In diagram 5a, the first cDNA strand is the template, which means that the sequence is identical to the second cDNA strand.

          Comment


          • #6
            Okay now to summarize:

            In my case, indeed the library-type is fr-secondstrand as the R1 (forward) read maps in 5' to 3' direction of the mRNA.

            And the reason to call it fr-secondstrand is that the first cDNA strand only served as template for the generation of the R1 read, which is identical to the "second strand" (leading to the name fr-secondstrand).

            Up to now I used fr-unstranded as library type parameter, which also gave me good results. But I think in future I will be using the correct library type and hope that this will improve my result further.


            Thank your very much guys, this issue has been a mystery for a long time for me and now I finally get it

            Comment


            • #7
              Hi guys,
              i apologize for reviving the thread but i am also a bit confused about the stranded RNA-seq.
              I have some Illumina PE data which is stranded but i dont know how the library was generated. I received bam files aligned with TopHat. So i used RSeQC's 'infer_experiment.py' command to tell me how the libraries are stranded.
              So for one of them i get: 1++,1–,2+-,2-+ and for the other 1+-,1-+,2++,2–. Now my problem is to link this info to TopHat fr-firststrand or fr-secondstrand. From what i have read so far on the web it seems to me that:
              - fr-secondstrand corresponds to 1++,1–,2+-,2-+
              - fr-firststrand corresponds to 1+-,1-+,2++,2–

              Is that right?

              Asking because i wonder if the alignment could be improved if the appropriate library type is used. As of now default unstranded was used.

              Thank you for your help time

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Strategies for Sequencing Challenging Samples
                by seqadmin


                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                03-22-2024, 06:39 AM
              • seqadmin
                Techniques and Challenges in Conservation Genomics
                by seqadmin



                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                Avian Conservation
                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                03-08-2024, 10:41 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, Yesterday, 06:37 PM
              0 responses
              12 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, Yesterday, 06:07 PM
              0 responses
              10 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-22-2024, 10:03 AM
              0 responses
              51 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-21-2024, 07:32 AM
              0 responses
              68 views
              0 likes
              Last Post seqadmin  
              Working...
              X