SEQanswers

Go Back   SEQanswers > Sequencing Technologies/Companies > Illumina/Solexa



Similar Threads
Thread Thread Starter Forum Replies Last Post
Analysis of Directional mRNA-seq data / Illumina jmtepp RNA Sequencing 7 02-26-2014 11:13 PM
Directional RNA-seq: Illumina Tru-seq versus dUTP based method jazz Sample Prep / Library Generation 35 06-06-2013 10:50 AM
RNA-Seq: Directional RNA deep sequencing sheds new light on the transcriptional respo Newsbot! Literature Watch 0 06-30-2011 02:00 AM
Directional RNA Seq huguesparri Illumina/Solexa 28 06-07-2011 05:56 AM
Illumina directional RNA-seq protocol Herve Illumina/Solexa 10 06-13-2010 07:18 AM

Reply
 
Thread Tools
Old 06-04-2013, 08:51 AM   #41
carmeyeii
Senior Member
 
Location: Mexico

Join Date: Mar 2011
Posts: 137
Default

Quote:
Originally Posted by flobpf View Post

UPDATE June 10, 2011: I contacted Illumina and they were confused too. However, finally they (and people at TopHat and our sequencing center) resolved the issue. The reads that come out of the machine have the same sequence as the CODING strand of the DNA and not the template strand.
Is this true?

If that is the case, then

a) the illustration above should portray the complementary strand to the RNA being used as template to synthesize (and read) the actual mRNA molecule by the sequencer.

or

b) The chemistry is as portrayed by the ilustration, but the sequencer translates the base fluorescing to the complementary base, in order to output a read that is the actual sequence of the mRNA molecule?



C
carmeyeii is offline   Reply With Quote
Old 06-05-2013, 07:26 AM   #42
kmcarr
Senior Member
 
Location: USA, Midwest

Join Date: May 2008
Posts: 1,150
Default

Quote:
Originally Posted by carmeyeii View Post
Quote:
Quote: Originally Posted by flobpf

UPDATE June 10, 2011: I contacted Illumina and they were confused too. However, finally they (and people at TopHat and our sequencing center) resolved the issue. The reads that come out of the machine have the same sequence as the CODING strand of the DNA and not the template strand.
Is this true?

If that is the case, then

a) the illustration above should portray the complementary strand to the RNA being used as template to synthesize (and read) the actual mRNA molecule by the sequencer.

or

b) The chemistry is as portrayed by the ilustration, but the sequencer translates the base fluorescing to the complementary base, in order to output a read that is the actual sequence of the mRNA molecule?



C
That information is from two years ago (which is a couple of epochs in NGS time). At that time (I believe) Illumina used a ligation based protocol for directional RNA-Seq. For that library prep protocol the correct --library-type option is fr-secondstrand.

Now the Illumina TruSeq stranded RNA-Seq kits use a dUTP second strand marking protocol (like ScriptSeq). The correct option for this protocol is fr-firststrand.

You need to make certain you know what library prep protocol was used before trying to interpret strandedness.
kmcarr is offline   Reply With Quote
Old 06-05-2013, 11:54 AM   #43
carmeyeii
Senior Member
 
Location: Mexico

Join Date: Mar 2011
Posts: 137
Default C

Quote:
Originally Posted by kmcarr View Post
Now the Illumina TruSeq stranded RNA-Seq kits use a dUTP second strand marking protocol (like ScriptSeq).
kcarr, I thought ScriptSeq's directional protocol is actually tag-based?
I think it doesn't involver dUTP, does it? -- at least ScriptSeq v2 doesn't?

Best,
C
carmeyeii is offline   Reply With Quote
Old 06-05-2013, 12:05 PM   #44
kmcarr
Senior Member
 
Location: USA, Midwest

Join Date: May 2008
Posts: 1,150
Default

Quote:
Originally Posted by carmeyeii View Post
kcarr, I thought ScriptSeq's directional protocol is actually tag-based?
I think it doesn't involver dUTP, does it? -- at least ScriptSeq v2 doesn't?

Best,
C
You're correct; sorry, brain lock on my part.
kmcarr is offline   Reply With Quote
Old 06-05-2013, 12:09 PM   #45
carmeyeii
Senior Member
 
Location: Mexico

Join Date: Mar 2011
Posts: 137
Default

No prob.
OK, so just to make sure I'm on the right track -- the sequence that is "spit out" by the sequencer is the actual sequence as seen by the camera... i.e, no base translation, just raw fluorescence -> letter .

So if the first read of a pair is the "first [cDNA] strand", THAT one was the strand synthesized during the first round of sequencing, using as template the strand corresponding to the RNA molecule sequence and direction.

C
carmeyeii is offline   Reply With Quote
Old 06-05-2013, 12:30 PM   #46
kmcarr
Senior Member
 
Location: USA, Midwest

Join Date: May 2008
Posts: 1,150
Default

Quote:
Originally Posted by carmeyeii View Post
No prob.
OK, so just to make sure I'm on the right track -- the sequence that is "spit out" by the sequencer is the actual sequence as seen by the camera... i.e, no base translation, just raw fluorescence -> letter .
Correct.

Quote:
So if the first read of a pair is the "first [cDNA] strand", THAT one was the strand synthesized during the first round of sequencing, using as template the strand corresponding to the RNA molecule sequence and direction.

C
The first cDNA strand synthesized uses the actual RNA molecule as its template, so it seems a little convoluted to say "using as template the strand corresponding to the RNA molecule sequence and direction." (which almost made my head spin around like Linda Blair ).
kmcarr is offline   Reply With Quote
Old 06-05-2013, 12:33 PM   #47
carmeyeii
Senior Member
 
Location: Mexico

Join Date: Mar 2011
Posts: 137
Red face


I know - my bad. i just wanted to say that it is the original RNA molecule, without it being "actually the RNA molecule", which of course we all know because this is cDNA we're talking about.
carmeyeii is offline   Reply With Quote
Old 07-02-2013, 09:56 PM   #48
dvanic
Member
 
Location: Sydney, Australia

Join Date: Jan 2012
Posts: 61
Default

Quote:
Now the Illumina TruSeq stranded RNA-Seq kits use a dUTP second strand marking protocol (like ScriptSeq). The correct option for this protocol is fr-firststrand.
And the correct option for htseq-count is --stranded=reverse?
dvanic is offline   Reply With Quote
Old 07-03-2013, 02:32 AM   #49
kmcarr
Senior Member
 
Location: USA, Midwest

Join Date: May 2008
Posts: 1,150
Default

Quote:
Now the Illumina TruSeq stranded RNA-Seq kits use a dUTP second strand marking protocol (like ScriptSeq). The correct option for this protocol is fr-firststrand.
Quote:
Originally Posted by dvanic View Post
And the correct option for htseq-count is --stranded=reverse?
No, for this protocol you should use --stranded=yes. I was wrong.

EDIT:

Sorry, I was wrong in my original post, and dvanic was correct. If your library was prepared using the Illumina TruSeq Stranded RNA Kit use --stranded reverse in htseq-count.

Last edited by kmcarr; 04-03-2014 at 04:33 AM.
kmcarr is offline   Reply With Quote
Old 07-03-2013, 09:33 PM   #50
dvanic
Member
 
Location: Sydney, Australia

Join Date: Jan 2012
Posts: 61
Post Sorry, but I think you're wrong!

I am using the currently available Illumina TrueSeq Stranded Total RNA Sample Preparation kit, the manual for which is Illumina part number 15031048 Rev. D. (googling that gives you the manual)

First off - QC; I use rseqc as one of my QC tools and it shows me that my "strandedness" is relatively clean:
Code:
This is PairEnd Data
Fraction of reads explained by "1++,1--,2+-,2-+": 0.0375
Fraction of reads explained by "1+-,1-+,2++,2--": 0.9625
Fraction of reads explained by other combinations: 0.0000
So I can assume that for highly expressed housekeeping genes looking at the flags should be relatively informative.

/not to mention that rseqc is actually giving me the answer (facepalm) I am looking for (looked at this after writing everything below):

Quote:
We can infer that this is a strand-specific RNA-seq data. Strandness of read1 is opposite to that of gene model, while strandness of read2 is is consistent with the strand of reference gene model.
1+-,1-+,2++,2––
read1 mapped to ‘+’ strand indicates parental gene on ‘-‘ strand
read1 mapped to ‘-‘ strand indicates parental gene on ‘+’ strand
read2 mapped to ‘+’ strand indicates parental gene on ‘+’ strand
read2 mapped to ‘-‘ strand indicates parental gene on ‘-‘ strand
When I pull out the reads from the human gapdh locus (irrespective of strand, using samtools) and look at them, I see that the majority of tags are 163/83:
Code:
# Count  Flag
      1 345
      6 153
      7 73
     17 145
     17 97
     53 385
     76 417
     88 339
     88 419
     90 337
     93 369
    136 129
    158 99
    159 147
    187 113
    197 137
    753 89
   1796 161
   1859 81
  20382 163
  20382 83
These tags mean:
  • 163 - read paired; read mapped in proper pair; mate reverse strand; second in pair - i.e. 2nd in pair is on (+) strand
  • 83 - read paired; read mapped in proper pair; read reverse strand; first in pair - i.e. 1st in pair is on (-) strand

GAPDH in humans is on the (+) strand, so looking at Simon's response in the thread above I would think it's --stranded=reverse:

Quote:
The first mate aligns to the strand opposite to the gene, so you need --stranded=reverse.
If I look at a case similar to Nico's, where the gene is on the (-) strand (tfrc, not as highly expressed as GAPDH in my dataset):
Code:
# Count  Flag
      1 129
      1 321
      1 353
      1 355
      1 403
      3 433
      4 401
     13 153
     14 65
     19 177
     50 73
    280 97
    329 145
    782 99
    815 147
Most of the reads are 99/146, just like in Nico's example...
So looking purely at the data + Simon's comment above, it seems like I should be using --stranded=reverse


__________________________________________________________________________


When I use htseq-count on my sam files from the gapdh locus, if I use stranded=yes:
Code:
cat gapdh_yes.counttable1 | awk '{if ($2 != 0) print $0}'
ENSG00000010295.14	18
ENSG00000111640.9	18
no_feature	23676
alignment_not_unique	588
When I use htseq-count on my sam files from just the gapdh locus, if I use stranded=reverse:
Code:
cat gapdh_reverse.counttable1 | awk '{if ($2 != 0) print $0}'
ENSG00000010295.14	103
ENSG00000111640.9	23279
no_feature	330
alignment_not_unique	588
ENSG00000010295.14 is IFFO1, a much more lowly expressed gene on the (-) strand downstream of GAPDH.


__________________________________________________________________________
BUT I would like to logically understand what's going on.
Looking at the information on the Illumina website, it says:
Quote:
How do the TruSeq Stranded mRNA and Total RNA protocols generate stranded cDNA?
Strand specificity is achieved by replacing dTTP with dUTP in the Second Strand Marking Mix (SMM). The incorporation of dUTP in second strand synthesis effectively quenches the second strand during amplification, since the polymerase used in the assay will not incorporate past this nucleotide. Further specificity is achieved by addition of Actinomycin D to the First Strand Master Mix Act D (FSA). Actinomycin prevents spurious DNA dependent synthesis during first strand synthesis, while allowing RNA dependent synthesis.

How do you ensure that different adapters are ligated to each end of a DNA fragment using this protocol?
Illumina’s proprietary method ensures ligation of two different adapters in the required orientation to opposing ends of a DNA fragment. PCR selects for these and finalizes the construct ready for hybridizing onto the flow cell surface. Adapter sequences can be determined by sequencing the ligation fragments, but sequence information alone is not sufficient to uncover the method.

How is strandedness maintained after DNA amplification?
Strandedness is maintained via the directionality of the adapters. The p7 adapter will be on the 3' end of the cDNA strand. As a consequence, the cDNA strand is sequenced. Second strand synthesis is performed using dUTP in place of dUTP and a high fidelity polymerase that cannot process through dUTP template is used for PCR enrichment. As a result, only the first strand product is amplified and the strand information is retained based on the p5 and p7 adapter orientation.
The P5 end is sequenced first, then the P7; another name for the P5 is the "Universal adapter" (UNad) and P7 is "Index-containing adapter" (IndexAd), because it has the index for de-multiplexing multiple libraries run on the same lane.

Now for some ASCII art:

Code:
RNA 	5' ~~~~~~~~~~~~~~~~~~~~~~~~ 3'

cDNA-1	3' ------------------------ 5' /Illumina 1st strand synthesis
cDNA-2	5' ---U---U---U---U-------- 3' /Illumina 2nd strand synthesis
# what you have at the end of the "Synthesize Second Strand cDNA" step pp. 69 of the manual

# Next step - Adenylate 3' Ends
A single ‘A’ nucleotide is added to the 3’ ends of the blunt fragments to prevent them from ligating to one another during the adapter ligation reaction. A corresponding single‘T’ nucleotide on the 3’ end of the adapter provides a complementary overhang for ligating the adapter to the fragment. This strategy ensures a low rate of chimera (concatenated template) formation.

Code:
# have in tube:
cDNA-1	3' A------------------------ 5' /Illumina 1st strand + A
cDNA-2	5' ---U---U---U---U--------A 3' /Illumina 2nd strand + A
# Ligate Adapters
This process ligates multiple indexing adapters to the ends of the ds cDNA, preparing them for hybridization onto a flow cell.
Code:
# have in tube:
cDNA-1	3' IndexAd-A-------------------------UNad 5' /Illumina 1st strand + A
cDNA-2	5' UnAd--U---U---U---U----------A IndexAd 3' /Illumina 2nd strand + A
As said above in the Illumina documentation, "the p7 (indexing) adapter will be on the 3' end of the cDNA strand"

# PCR-amplify, using a polymerase that doesn't amplify from the dUTP containing-strand:
# have in tube:
Code:
cDNA-1	3' IndexAd-A-----------------------UNad 5'
copy 	5' IndexAd-T-----------------------UNad 3'
# Sequence, starting from adapter P5 (UNad) then P7 (IndexAd).
# Following the schematic here (http://nextgen.mgh.harvard.edu/IlluminaChemistry.html), only one of those strands would be sequenced in the 1st round of sequencing - the one that is "stuck" to the flowcell by the indexing adapter end, so "copy" in my diagram:
Code:
stuck to flowcell-copy - 5' IndexAd-T-----------------------UNad 3'
											   		 <------UNad 5'

Results in Read 1 being  3' IndexAd-A---------------------- 5'
This means the sequence of read 1 is identical to that of cDNA-1 above, meaning that is the complement of the original RNA - which is indeed what I see in the case of GAPDH and TFRC - 1st in pair is on opposite strand to the gene.

# Then we have bridge amplification, followed by cleavage at the P7-attached fragments, meaning the sequence of the attached fragment is now that of "cDNA1"
In Read 2 we sequence off the P7 adapter:
Code:
stuck to flowcell-cDNA1 - 5'-UNad--------------------------T-IndexAd 3'
													<------A-IndexAd 5'

Results in read2 being 	 3'-UNad---------------------------A 5'
I.e. the sequence of the read is complementary to cDNA1, or the sequence of the original RNA, which is what we see: GAPDH read 2 is on the + strand, and TFRC is on the - strand, just like the gene annotations.
dvanic is offline   Reply With Quote
Old 07-11-2013, 08:50 PM   #51
jp.
Senior Member
 
Location: NikoNarita.jp

Join Date: Jul 2013
Posts: 142
Red face Single Cell RNA seq

Hi to all
this is my first post in this forum, and I am sorry my english is not good so please dont mind. I am asking some guidance for RNA-seq differential expression.
I want to do single cell sequencing (Illumina, PE) of human sample (before and after chemotherapy) to study differential expression, transcript discover using tophat, Cufflinks. I want to ask your openion :
1. How much the seq read length should be (long or short) (100, 151, or more) ?
2. How many replicate per sample ? for example, 1 sample of liver cells (before chemotherapy); how many replicates should be taken for sequencing (one time sequencing the sample or three times sequencing) ?
3. How long will it take to finish 20 sample sequencing ?
4. Is sequencing procedure for single-cell is same as normal sequencing ?

Waiting for your kind replies.
Thank you
Nakiyami jp.
jp. is offline   Reply With Quote
Old 04-03-2014, 01:35 AM   #52
super0925
Senior Member
 
Location: UK

Join Date: Feb 2014
Posts: 206
Default

Quote:
Originally Posted by flobpf View Post
I think that the Illumina directional sequencing protocol only does first strand sequencing. So fr-firststrand may be the option to use for TopHat and Cufflinks. Similarly, if you want to find out # of reads mapping to annotated genes using htseq-counts function and want to know both sense and antisense expression of a gene, one would need to use the option --stranded=no else it will only give sense expression.
Hi flobpf,
I am a rookie in RNA-Seq. Now I am playing the analysis on two single-ended reads datasets, one is from Illumina, the other is from Proton.
I have asked the people doing the NGS, so far I know that they are not sure which library type (fr-unstranded,fr-firststranded,fr-secondstranded), one thing with the library prep they use though is that it retains polarity, so there is directionality in the sequences.
From this post, in my cases, should I use tophat/cufflinks(fr-firststrand) and htseq-counts(--stranded=reverse ) option? I am sorry for the naive question.Thanks!

Last edited by super0925; 04-03-2014 at 01:51 AM.
super0925 is offline   Reply With Quote
Old 04-03-2014, 04:06 AM   #53
NicoBxl
not just another member
 
Location: Belgium

Join Date: Aug 2010
Posts: 263
Default

if it's single-end, you don't have to precise the type of library.
NicoBxl is offline   Reply With Quote
Old 04-03-2014, 04:43 AM   #54
kmcarr
Senior Member
 
Location: USA, Midwest

Join Date: May 2008
Posts: 1,150
Default

Quote:
Originally Posted by NicoBxl View Post
if it's single-end, you don't have to precise the type of library.
Library strandedness still matters when you are aligning single end reads. For Illumina TruSeq Stranded RNA libraries single end reads match the anti-sense strand of mRNA (aka reverse, firststrand of cDNA).

Tophat/cufflinks defaults to --fr-unstranded. Stranded, single end reads would align o.k. with this setting but you should use --fr-firststrand.

htseq-count defaults to --stranded=yes. For TruSeq Stranded RNA libraries this default is not appropriate and you should change it to --stranded=reverse, even if your reads are single-end.
kmcarr is offline   Reply With Quote
Old 03-13-2015, 10:08 AM   #55
Wei-HD
Member
 
Location: Germany

Join Date: Oct 2009
Posts: 59
Default

Hi all,

I have 2 x 50 bp paired end data, library preparation from NEBNext® Ultra™ Directional RNA Library Prep Kit for Illumina®, dUTP method.

Data as: seq_1.txt and seq_2.txt

From the discussion in this post, seq_1.txt contains the sequence of the antisense strand and locates at the 3' end of the fragment, seq_2.txt contains sequence of sense strand and the 5' end.

So for Tophat, I will use this option: --library-type fr-firststrand. But I have one question about the order of the two files? I am not sure if the order of the files will give completely different results.

tophat --library-type fr-firststrand seq_1.txt seq_2.txt
or
tophat --library-type fr-firststrand seq_2.txt seq_1.txt

And for htseq-count, I think I should use stranded=reverse option, because my first reads are from the antisense (opposite) strand.

Thanks a lot for any comments.
Wei-HD is offline   Reply With Quote
Old 03-13-2015, 10:13 AM   #56
fanli
Senior Member
 
Location: California

Join Date: Jul 2014
Posts: 198
Default

You want
Code:
tophat --library-type fr-firststrand seq_1.txt seq_2.txt
Of course, I'd check that your libraries are indeed in the correct orientation by doing what dvanic did. I recently found out that my Ovation libraries (also using dUTP method) were in fact fr-secondstrand and stranded=yes
fanli is offline   Reply With Quote
Old 03-13-2015, 10:23 AM   #57
Wei-HD
Member
 
Location: Germany

Join Date: Oct 2009
Posts: 59
Default

Thanks fanli! I have checked my reads. Seq_1.txt is from the antisense strand Seq_2.txt is from the sense strand on UCSC.

Then I do tophat --library-type fr-firststrand seq_1.txt seq_2.txt and stranded=reverse

I think tophat --library-type fr-firststrand seq_2.txt seq_1.txt will end up with less reads mapped.
Wei-HD is offline   Reply With Quote
Old 03-13-2015, 10:51 AM   #58
fanli
Senior Member
 
Location: California

Join Date: Jul 2014
Posts: 198
Default

Quote:
Then I do tophat --library-type fr-firststrand seq_1.txt seq_2.txt and stranded=reverse
Yep. Tbh, I'm not sure what happens if you reverse the r1 and r2 files - I suppose most of the pairs get thrown away for being discordantly paired but have never tried.
fanli is offline   Reply With Quote
Old 11-19-2015, 03:18 PM   #59
son_nexg
Junior Member
 
Location: Australia

Join Date: Jul 2011
Posts: 8
Default

Hi Guys

I did a tophat2 alignment of the stranded RNA-seq data generated with Trueseq libraries and I am getting strange/random distribution of the reads between sense and antisense strand:


End 1 Sense End 1 Antisense End 2 Sense End 2 Antisense End 1 % Sense End 2 % Sense
590,379 30,087,178 29,906,269 588,963 1.924 98.069
559,882 29,395,607 29,103,746 557,860 1.869 98.119
521,844 27,260,014 27,021,895 516,739 1.878 98.124
28,392,839 508,446 508,950 28,645,451 98.241 1.746
447,074 25,369,572 25,145,640 444,053 1.732 98.265
25,655,301 524,163 529,995 26,208,516 97.998 1.982
569,612 27,917,987 27,640,016 566,733 2 97.991
26,829,925 567,503 571,852 27,104,313 97.929 2.066
28,741,488 507,667 509,982 29,116,876 98.264 1.721
657,904 30,651,762 30,227,867 652,430 2.101 97.887
515,548 27,660,113 27,433,598 515,549 1.83 98.155
27,514,719 519,533 519,904 27,667,448 98.147 1.844


I would have expected majority of the reads mapping only to one of the strands and not randomly switching stranded as above.

Any help with this will be appreciated.

Thanks.
son_nexg is offline   Reply With Quote
Reply

Tags
cufflinks, directional rna-seq, illumina, rna-seq, tophat

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 07:54 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO