Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • TopHat and cufflinks of non-strand-specific reads

    Hi all,

    I have questions regarding the strand info after TopHat and Cufflinks in NON-strand-specific experiments. I tried to look for the answers in the forum and in the web – but did not manage, so I am posting it here.

    I have Illumina RNA seq reads, from a library that was prepared with NON-strand specific protocol.

    1. I saw in tophat manual that TopHat will treat reads as strand specific. What option should I use when running TopHat if my reads are not strand specific?

    2. After TopHat – in the bam files – Will I get strand for all reads? Or only for junction read?
    In the bam file – will I have a strand according to the splice junction orientation or according the actual strand the read was mapped to?

    3. Cufflinks gives “a guess” to the strand of the transcript. How is this guess made?

    Thanks a lot.

  • #2
    Hi gfmgfm,
    did you manage to get an answer yet to your question? I'm especially interested in number one... I use Cufflinks for expression analysis and I'm not sure, whether it will use reads from both strands to calculate the FPKM value or just the reads which are on the sense-strand...

    Thanks

    Comment


    • #3
      From the manual
      --library-type TopHat will treat the reads as strand specific. Every read alignment will have an XS attribute tag. Consider supplying library type options below to select the correct RNA-seq protocol.
      If you look below at the possible flags:
      Library Type Examples Description
      fr-unstranded Standard Illumina Reads from the left-most end of the fragment (in transcript coordinates) map to the transcript strand, and the right-most end maps to the opposite strand.
      fr-firststrand dUTP, NSR, NNSR Same as above except we enforce the rule that the right-most end of the fragment (in transcript coordinates) is the first sequenced (or only sequenced for single-end reads). Equivalently, it is assumed that only the strand generated during first strand synthesis is sequenced.
      fr-secondstrand Ligation, Standard SOLiD Same as above except we enforce the rule that the left-most end of the fragment (in transcript coordinates) is the first sequenced (or only sequenced for single-end reads). Equivalently, it is assumed that only the strand generated during second strand synthesis is sequenced.
      I.e. even though the manual is saying that Tophat will treat all reads as stranded, the first option is giving you a strand-unaware alignment.

      After TopHat – in the bam files – Will I get strand for all reads? Or only for junction read?
      You will get a strand for all reads. Even if a read does not go over a splice junction, it can usually be uniquely mapped to the + or - strand of DNA that the RNA came from, i.e.

      DNA
      (+) ATGCCGAGAGAGAGTTCAGAGAGATTCG
      (-) TACGGCTCTCTCTCAAGTCTCTCTAAGC

      Read
      GCCGAGAGAG

      Will map to + strand only, and be reported to you as +
      In the bam file – will I have a strand according to the splice junction orientation or according the actual strand the read was mapped to?
      Hopefully this will be the same, since the splice site should be in the same "direction" as your actual read.

      3. Cufflinks gives “a guess” to the strand of the transcript. How is this guess made?
      Look at the information on the cufflinks manual/how it works page. But, as mentioned a lot on this forum, cufflinks is very far from perfect, and I would strongly recommend you run it giving it a reference annotation.

      Comment


      • #4
        Thanks dvanic for your reply.
        So is it right that the '--library-type' option is used only in tophat for the junction finding and has no effect on cufflinks and cuffdiff (even though they have the same option)?

        I analyzed my strand-specific SOLiD reads (2 conditions having 3 replicates each, total around 100M paired-end reads of a fungi) once using fr-secondstrand and once using fr-unstranded, and the only difference I got are some minor variations on the junction positions.

        I would have expected that using the "fr-secondstrand" option, only reads which align to the sense strand of the gene/transcript would be taken into account for the expression calculation. However, this doesn't seem to be the case because, because there are no major differences between "fr-secondstrand" and "fr-unstranded". This is true even for certain genes which have only anti-sense reads aligned to them.

        It just seems that there is not much use in using stranded information, or am I wrong?

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Essential Discoveries and Tools in Epitranscriptomics
          by seqadmin




          The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
          Yesterday, 07:01 AM
        • seqadmin
          Current Approaches to Protein Sequencing
          by seqadmin


          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
          04-04-2024, 04:25 PM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 04-11-2024, 12:08 PM
        0 responses
        58 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 10:19 PM
        0 responses
        53 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 09:21 AM
        0 responses
        45 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-04-2024, 09:00 AM
        0 responses
        55 views
        0 likes
        Last Post seqadmin  
        Working...
        X