Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • gfmgfm
    Member
    • Jun 2010
    • 64

    TopHat and cufflinks of non-strand-specific reads

    Hi all,

    I have questions regarding the strand info after TopHat and Cufflinks in NON-strand-specific experiments. I tried to look for the answers in the forum and in the web – but did not manage, so I am posting it here.

    I have Illumina RNA seq reads, from a library that was prepared with NON-strand specific protocol.

    1. I saw in tophat manual that TopHat will treat reads as strand specific. What option should I use when running TopHat if my reads are not strand specific?

    2. After TopHat – in the bam files – Will I get strand for all reads? Or only for junction read?
    In the bam file – will I have a strand according to the splice junction orientation or according the actual strand the read was mapped to?

    3. Cufflinks gives “a guess” to the strand of the transcript. How is this guess made?

    Thanks a lot.
  • RFo
    Junior Member
    • May 2012
    • 5

    #2
    Hi gfmgfm,
    did you manage to get an answer yet to your question? I'm especially interested in number one... I use Cufflinks for expression analysis and I'm not sure, whether it will use reads from both strands to calculate the FPKM value or just the reads which are on the sense-strand...

    Thanks

    Comment

    • dvanic
      Member
      • Jan 2012
      • 61

      #3
      From the manual
      --library-type TopHat will treat the reads as strand specific. Every read alignment will have an XS attribute tag. Consider supplying library type options below to select the correct RNA-seq protocol.
      If you look below at the possible flags:
      Library Type Examples Description
      fr-unstranded Standard Illumina Reads from the left-most end of the fragment (in transcript coordinates) map to the transcript strand, and the right-most end maps to the opposite strand.
      fr-firststrand dUTP, NSR, NNSR Same as above except we enforce the rule that the right-most end of the fragment (in transcript coordinates) is the first sequenced (or only sequenced for single-end reads). Equivalently, it is assumed that only the strand generated during first strand synthesis is sequenced.
      fr-secondstrand Ligation, Standard SOLiD Same as above except we enforce the rule that the left-most end of the fragment (in transcript coordinates) is the first sequenced (or only sequenced for single-end reads). Equivalently, it is assumed that only the strand generated during second strand synthesis is sequenced.
      I.e. even though the manual is saying that Tophat will treat all reads as stranded, the first option is giving you a strand-unaware alignment.

      After TopHat – in the bam files – Will I get strand for all reads? Or only for junction read?
      You will get a strand for all reads. Even if a read does not go over a splice junction, it can usually be uniquely mapped to the + or - strand of DNA that the RNA came from, i.e.

      DNA
      (+) ATGCCGAGAGAGAGTTCAGAGAGATTCG
      (-) TACGGCTCTCTCTCAAGTCTCTCTAAGC

      Read
      GCCGAGAGAG

      Will map to + strand only, and be reported to you as +
      In the bam file – will I have a strand according to the splice junction orientation or according the actual strand the read was mapped to?
      Hopefully this will be the same, since the splice site should be in the same "direction" as your actual read.

      3. Cufflinks gives “a guess” to the strand of the transcript. How is this guess made?
      Look at the information on the cufflinks manual/how it works page. But, as mentioned a lot on this forum, cufflinks is very far from perfect, and I would strongly recommend you run it giving it a reference annotation.

      Comment

      • RFo
        Junior Member
        • May 2012
        • 5

        #4
        Thanks dvanic for your reply.
        So is it right that the '--library-type' option is used only in tophat for the junction finding and has no effect on cufflinks and cuffdiff (even though they have the same option)?

        I analyzed my strand-specific SOLiD reads (2 conditions having 3 replicates each, total around 100M paired-end reads of a fungi) once using fr-secondstrand and once using fr-unstranded, and the only difference I got are some minor variations on the junction positions.

        I would have expected that using the "fr-secondstrand" option, only reads which align to the sense strand of the gene/transcript would be taken into account for the expression calculation. However, this doesn't seem to be the case because, because there are no major differences between "fr-secondstrand" and "fr-unstranded". This is true even for certain genes which have only anti-sense reads aligned to them.

        It just seems that there is not much use in using stranded information, or am I wrong?

        Comment

        Latest Articles

        Collapse

        • SEQadmin2
          Nine Things a Sample Prep Scientist Thinks About Before Sequencing
          by SEQadmin2


          I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.


          Here are nine questions we think about, in roughly the order they matter, before...
          06-18-2026, 07:11 AM
        • SEQadmin2
          From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
          by SEQadmin2


          Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


          The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
          ...
          06-02-2026, 10:05 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by SEQadmin2, 06-17-2026, 06:09 AM
        0 responses
        26 views
        0 reactions
        Last Post SEQadmin2  
        Started by SEQadmin2, 06-09-2026, 11:58 AM
        0 responses
        43 views
        0 reactions
        Last Post SEQadmin2  
        Started by SEQadmin2, 06-05-2026, 10:09 AM
        0 responses
        48 views
        0 reactions
        Last Post SEQadmin2  
        Started by SEQadmin2, 06-04-2026, 08:59 AM
        0 responses
        49 views
        0 reactions
        Last Post SEQadmin2  
        Working...