![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Analysis of Directional mRNA-seq data / Illumina | jmtepp | RNA Sequencing | 7 | 02-27-2014 12:13 AM |
Directional RNA-seq: Illumina Tru-seq versus dUTP based method | jazz | Sample Prep / Library Generation | 35 | 06-06-2013 11:50 AM |
RNA-Seq: Directional RNA deep sequencing sheds new light on the transcriptional respo | Newsbot! | Literature Watch | 0 | 06-30-2011 03:00 AM |
Directional RNA Seq | huguesparri | Illumina/Solexa | 28 | 06-07-2011 06:56 AM |
Illumina directional RNA-seq protocol | Herve | Illumina/Solexa | 10 | 06-13-2010 08:18 AM |
![]() |
|
Thread Tools |
![]() |
#41 | |
Senior Member
Location: Mexico Join Date: Mar 2011
Posts: 137
|
![]() Quote:
If that is the case, then a) the illustration above should portray the complementary strand to the RNA being used as template to synthesize (and read) the actual mRNA molecule by the sequencer. or b) The chemistry is as portrayed by the ilustration, but the sequencer translates the base fluorescing to the complementary base, in order to output a read that is the actual sequence of the mRNA molecule? ![]() ![]() C |
|
![]() |
![]() |
![]() |
#42 | ||
Senior Member
Location: USA, Midwest Join Date: May 2008
Posts: 1,178
|
![]() Quote:
Now the Illumina TruSeq stranded RNA-Seq kits use a dUTP second strand marking protocol (like ScriptSeq). The correct option for this protocol is fr-firststrand. You need to make certain you know what library prep protocol was used before trying to interpret strandedness. |
||
![]() |
![]() |
![]() |
#43 | |
Senior Member
Location: Mexico Join Date: Mar 2011
Posts: 137
|
![]() Quote:
I think it doesn't involver dUTP, does it? -- at least ScriptSeq v2 doesn't? Best, C |
|
![]() |
![]() |
![]() |
#44 | |
Senior Member
Location: USA, Midwest Join Date: May 2008
Posts: 1,178
|
![]() Quote:
|
|
![]() |
![]() |
![]() |
#45 |
Senior Member
Location: Mexico Join Date: Mar 2011
Posts: 137
|
![]()
No prob.
OK, so just to make sure I'm on the right track -- the sequence that is "spit out" by the sequencer is the actual sequence as seen by the camera... i.e, no base translation, just raw fluorescence -> letter . So if the first read of a pair is the "first [cDNA] strand", THAT one was the strand synthesized during the first round of sequencing, using as template the strand corresponding to the RNA molecule sequence and direction. C |
![]() |
![]() |
![]() |
#46 | ||
Senior Member
Location: USA, Midwest Join Date: May 2008
Posts: 1,178
|
![]() Quote:
Quote:
![]() |
||
![]() |
![]() |
![]() |
#47 |
Senior Member
Location: Mexico Join Date: Mar 2011
Posts: 137
|
![]() ![]() I know - my bad. i just wanted to say that it is the original RNA molecule, without it being "actually the RNA molecule", which of course we all know because this is cDNA we're talking about. ![]() |
![]() |
![]() |
![]() |
#48 | |
Member
Location: Sydney, Australia Join Date: Jan 2012
Posts: 61
|
![]() Quote:
|
|
![]() |
![]() |
![]() |
#49 | |
Senior Member
Location: USA, Midwest Join Date: May 2008
Posts: 1,178
|
![]() Quote:
EDIT: Sorry, I was wrong in my original post, and dvanic was correct. If your library was prepared using the Illumina TruSeq Stranded RNA Kit use --stranded reverse in htseq-count. Last edited by kmcarr; 04-03-2014 at 05:33 AM. |
|
![]() |
![]() |
![]() |
#50 | |||
Member
Location: Sydney, Australia Join Date: Jan 2012
Posts: 61
|
![]()
I am using the currently available Illumina TrueSeq Stranded Total RNA Sample Preparation kit, the manual for which is Illumina part number 15031048 Rev. D. (googling that gives you the manual)
First off - QC; I use rseqc as one of my QC tools and it shows me that my "strandedness" is relatively clean: Code:
This is PairEnd Data Fraction of reads explained by "1++,1--,2+-,2-+": 0.0375 Fraction of reads explained by "1+-,1-+,2++,2--": 0.9625 Fraction of reads explained by other combinations: 0.0000 /not to mention that rseqc is actually giving me the answer (facepalm) I am looking for (looked at this after writing everything below): Quote:
Code:
# Count Flag 1 345 6 153 7 73 17 145 17 97 53 385 76 417 88 339 88 419 90 337 93 369 136 129 158 99 159 147 187 113 197 137 753 89 1796 161 1859 81 20382 163 20382 83
GAPDH in humans is on the (+) strand, so looking at Simon's response in the thread above I would think it's --stranded=reverse: Quote:
Code:
# Count Flag 1 129 1 321 1 353 1 355 1 403 3 433 4 401 13 153 14 65 19 177 50 73 280 97 329 145 782 99 815 147 So looking purely at the data + Simon's comment above, it seems like I should be using --stranded=reverse __________________________________________________________________________ When I use htseq-count on my sam files from the gapdh locus, if I use stranded=yes: Code:
cat gapdh_yes.counttable1 | awk '{if ($2 != 0) print $0}' ENSG00000010295.14 18 ENSG00000111640.9 18 no_feature 23676 alignment_not_unique 588 Code:
cat gapdh_reverse.counttable1 | awk '{if ($2 != 0) print $0}' ENSG00000010295.14 103 ENSG00000111640.9 23279 no_feature 330 alignment_not_unique 588 __________________________________________________________________________ BUT I would like to logically understand what's going on. Looking at the information on the Illumina website, it says: Quote:
Now for some ASCII art: Code:
RNA 5' ~~~~~~~~~~~~~~~~~~~~~~~~ 3' cDNA-1 3' ------------------------ 5' /Illumina 1st strand synthesis cDNA-2 5' ---U---U---U---U-------- 3' /Illumina 2nd strand synthesis # Next step - Adenylate 3' Ends A single ‘A’ nucleotide is added to the 3’ ends of the blunt fragments to prevent them from ligating to one another during the adapter ligation reaction. A corresponding single‘T’ nucleotide on the 3’ end of the adapter provides a complementary overhang for ligating the adapter to the fragment. This strategy ensures a low rate of chimera (concatenated template) formation. Code:
# have in tube: cDNA-1 3' A------------------------ 5' /Illumina 1st strand + A cDNA-2 5' ---U---U---U---U--------A 3' /Illumina 2nd strand + A This process ligates multiple indexing adapters to the ends of the ds cDNA, preparing them for hybridization onto a flow cell. Code:
# have in tube: cDNA-1 3' IndexAd-A-------------------------UNad 5' /Illumina 1st strand + A cDNA-2 5' UnAd--U---U---U---U----------A IndexAd 3' /Illumina 2nd strand + A # PCR-amplify, using a polymerase that doesn't amplify from the dUTP containing-strand: # have in tube: Code:
cDNA-1 3' IndexAd-A-----------------------UNad 5' copy 5' IndexAd-T-----------------------UNad 3' # Following the schematic here (http://nextgen.mgh.harvard.edu/IlluminaChemistry.html), only one of those strands would be sequenced in the 1st round of sequencing - the one that is "stuck" to the flowcell by the indexing adapter end, so "copy" in my diagram: Code:
stuck to flowcell-copy - 5' IndexAd-T-----------------------UNad 3' <------UNad 5' Results in Read 1 being 3' IndexAd-A---------------------- 5' # Then we have bridge amplification, followed by cleavage at the P7-attached fragments, meaning the sequence of the attached fragment is now that of "cDNA1" In Read 2 we sequence off the P7 adapter: Code:
stuck to flowcell-cDNA1 - 5'-UNad--------------------------T-IndexAd 3' <------A-IndexAd 5' Results in read2 being 3'-UNad---------------------------A 5' |
|||
![]() |
![]() |
![]() |
#51 |
Senior Member
Location: NikoNarita.jp Join Date: Jul 2013
Posts: 142
|
![]()
Hi to all
this is my first post in this forum, and I am sorry my english is not good so please dont mind. I am asking some guidance for RNA-seq differential expression. I want to do single cell sequencing (Illumina, PE) of human sample (before and after chemotherapy) to study differential expression, transcript discover using tophat, Cufflinks. I want to ask your openion : 1. How much the seq read length should be (long or short) (100, 151, or more) ? 2. How many replicate per sample ? for example, 1 sample of liver cells (before chemotherapy); how many replicates should be taken for sequencing (one time sequencing the sample or three times sequencing) ? 3. How long will it take to finish 20 sample sequencing ? 4. Is sequencing procedure for single-cell is same as normal sequencing ? Waiting for your kind replies. Thank you Nakiyami jp. ![]() ![]() |
![]() |
![]() |
![]() |
#52 | |
Senior Member
Location: UK Join Date: Feb 2014
Posts: 206
|
![]() Quote:
I am a rookie in RNA-Seq. Now I am playing the analysis on two single-ended reads datasets, one is from Illumina, the other is from Proton. I have asked the people doing the NGS, so far I know that they are not sure which library type (fr-unstranded,fr-firststranded,fr-secondstranded), one thing with the library prep they use though is that it retains polarity, so there is directionality in the sequences. From this post, in my cases, should I use tophat/cufflinks(fr-firststrand) and htseq-counts(--stranded=reverse ) option? I am sorry for the naive question.Thanks! Last edited by super0925; 04-03-2014 at 02:51 AM. |
|
![]() |
![]() |
![]() |
#53 |
not just another member
Location: Belgium Join Date: Aug 2010
Posts: 264
|
![]()
if it's single-end, you don't have to precise the type of library.
|
![]() |
![]() |
![]() |
#54 | |
Senior Member
Location: USA, Midwest Join Date: May 2008
Posts: 1,178
|
![]() Quote:
Tophat/cufflinks defaults to --fr-unstranded. Stranded, single end reads would align o.k. with this setting but you should use --fr-firststrand. htseq-count defaults to --stranded=yes. For TruSeq Stranded RNA libraries this default is not appropriate and you should change it to --stranded=reverse, even if your reads are single-end. |
|
![]() |
![]() |
![]() |
#55 |
Member
Location: Germany Join Date: Oct 2009
Posts: 59
|
![]()
Hi all,
I have 2 x 50 bp paired end data, library preparation from NEBNext® Ultra™ Directional RNA Library Prep Kit for Illumina®, dUTP method. Data as: seq_1.txt and seq_2.txt From the discussion in this post, seq_1.txt contains the sequence of the antisense strand and locates at the 3' end of the fragment, seq_2.txt contains sequence of sense strand and the 5' end. So for Tophat, I will use this option: --library-type fr-firststrand. But I have one question about the order of the two files? I am not sure if the order of the files will give completely different results. tophat --library-type fr-firststrand seq_1.txt seq_2.txt or tophat --library-type fr-firststrand seq_2.txt seq_1.txt And for htseq-count, I think I should use stranded=reverse option, because my first reads are from the antisense (opposite) strand. Thanks a lot for any comments. |
![]() |
![]() |
![]() |
#56 |
Senior Member
Location: California Join Date: Jul 2014
Posts: 198
|
![]()
You want
Code:
tophat --library-type fr-firststrand seq_1.txt seq_2.txt |
![]() |
![]() |
![]() |
#57 |
Member
Location: Germany Join Date: Oct 2009
Posts: 59
|
![]()
Thanks fanli! I have checked my reads. Seq_1.txt is from the antisense strand Seq_2.txt is from the sense strand on UCSC.
Then I do tophat --library-type fr-firststrand seq_1.txt seq_2.txt and stranded=reverse I think tophat --library-type fr-firststrand seq_2.txt seq_1.txt will end up with less reads mapped. |
![]() |
![]() |
![]() |
#58 | |
Senior Member
Location: California Join Date: Jul 2014
Posts: 198
|
![]() Quote:
|
|
![]() |
![]() |
![]() |
#59 |
Junior Member
Location: Australia Join Date: Jul 2011
Posts: 8
|
![]()
Hi Guys
I did a tophat2 alignment of the stranded RNA-seq data generated with Trueseq libraries and I am getting strange/random distribution of the reads between sense and antisense strand: End 1 Sense End 1 Antisense End 2 Sense End 2 Antisense End 1 % Sense End 2 % Sense 590,379 30,087,178 29,906,269 588,963 1.924 98.069 559,882 29,395,607 29,103,746 557,860 1.869 98.119 521,844 27,260,014 27,021,895 516,739 1.878 98.124 28,392,839 508,446 508,950 28,645,451 98.241 1.746 447,074 25,369,572 25,145,640 444,053 1.732 98.265 25,655,301 524,163 529,995 26,208,516 97.998 1.982 569,612 27,917,987 27,640,016 566,733 2 97.991 26,829,925 567,503 571,852 27,104,313 97.929 2.066 28,741,488 507,667 509,982 29,116,876 98.264 1.721 657,904 30,651,762 30,227,867 652,430 2.101 97.887 515,548 27,660,113 27,433,598 515,549 1.83 98.155 27,514,719 519,533 519,904 27,667,448 98.147 1.844 I would have expected majority of the reads mapping only to one of the strands and not randomly switching stranded as above. Any help with this will be appreciated. Thanks. |
![]() |
![]() |
![]() |
Tags |
cufflinks, directional rna-seq, illumina, rna-seq, tophat |
Thread Tools | |
|
|