SEQanswers

Go Back   SEQanswers > Sequencing Technologies/Companies > Illumina/Solexa



Similar Threads
Thread Thread Starter Forum Replies Last Post
Analysis of Directional mRNA-seq data / Illumina jmtepp RNA Sequencing 7 02-26-2014 11:13 PM
Directional RNA-seq: Illumina Tru-seq versus dUTP based method jazz Sample Prep / Library Generation 35 06-06-2013 10:50 AM
RNA-Seq: Directional RNA deep sequencing sheds new light on the transcriptional respo Newsbot! Literature Watch 0 06-30-2011 02:00 AM
Directional RNA Seq huguesparri Illumina/Solexa 28 06-07-2011 05:56 AM
Illumina directional RNA-seq protocol Herve Illumina/Solexa 10 06-13-2010 07:18 AM

Reply
 
Thread Tools
Old 06-10-2011, 08:16 AM   #21
flobpf
Member
 
Location: USA

Join Date: Apr 2010
Posts: 76
Talking Resolved

UPDATE June 10, 2011: I contacted Illumina and they were confused too. However, finally they (and people at TopHat and our sequencing center) resolved the issue. The reads that come out of the machine have the same sequence as the CODING strand of the DNA and not the template strand.

The correct option to use for Cufflinks is fr-secondstrand
Flobpf
flobpf is offline   Reply With Quote
Old 06-10-2011, 09:25 PM   #22
marcowanger
Senior Member
 
Location: Hong Kong

Join Date: Dec 2008
Posts: 350
Default

Flobpf, nice to hear from you again.

May you kindly do some illustration? I got the protocol from our sequencing center and it seems the other way round. Maybe we have different protocol?

Marco

Quote:
Originally Posted by flobpf View Post
UPDATE June 10, 2011: I contacted Illumina and they were confused too. However, finally they (and people at TopHat and our sequencing center) resolved the issue. The reads that come out of the machine have the same sequence as the CODING strand of the DNA and not the template strand.

The correct option to use for Cufflinks is fr-secondstrand
Flobpf
marcowanger is offline   Reply With Quote
Old 06-15-2011, 10:26 AM   #23
flobpf
Member
 
Location: USA

Join Date: Apr 2010
Posts: 76
Default Depends on method

Quote:
Originally Posted by marcowanger View Post
Flobpf, nice to hear from you again.

May you kindly do some illustration? I got the protocol from our sequencing center and it seems the other way round. Maybe we have different protocol?

Marco
It depends on what method you are using. If you are using Illumina directional sequencing protocol, you get the coding strand sequence. With dUTP, its the template strand. The correct options to use are noted here now:
http://cufflinks.cbcb.umd.edu/manual.html#library
flobpf is offline   Reply With Quote
Old 06-16-2011, 09:57 PM   #24
marcowanger
Senior Member
 
Location: Hong Kong

Join Date: Dec 2008
Posts: 350
Talking

It make more sense. We used dUTP so we got template strand.

Quote:
Originally Posted by flobpf View Post
It depends on what method you are using. If you are using Illumina directional sequencing protocol, you get the coding strand sequence. With dUTP, its the template strand. The correct options to use are noted here now:
http://cufflinks.cbcb.umd.edu/manual.html#library
marcowanger is offline   Reply With Quote
Old 06-29-2011, 01:01 AM   #25
marcowanger
Senior Member
 
Location: Hong Kong

Join Date: Dec 2008
Posts: 350
Default

Dear Simon,

How can I modify the code to reverse the --strand from counting coding strand to template strand?

Quote:
Originally Posted by Simon Anders View Post
Just to quickly clarify:

htseq-count does not look at the 'XS' optional field, only a the strand information according to the FLAG field. The strand information in the FLAG field (bit 0x10) specifies whether the sequence as read by the sequencer and as found in the FASTQ file has the same orientation as the sequence in the reference FASTA file ("+" strand, bit cleared), or whether the sequences in FASTQ and FASTA file are reverse-complements of each other ("-" strand, bit set).

With "--stranded=no", htseq-count ignored the strand bit. With "--stranded=yes" and single-end reads, it will count a read only if the alignment strand information according to the FLAG field is the same as the strand information for the gene/exon in the GFF file. For paired-end data, the strands have to be the same for the mate from the first sequencing pass and opposite for the mate from the second pass.

I guess I should add a new option "--stranded=reverse" that reverses this, i.e., counts reads only if, for single end, the strand informations in FLAG field and GFF file are opposite, and, for paired-end, if the first pass mate has opposite and the second pass mate same strand as the gene.

I hope this makes sense.

Simon
marcowanger is offline   Reply With Quote
Old 06-29-2011, 01:08 AM   #26
Simon Anders
Senior Member
 
Location: Heidelberg, Germany

Join Date: Feb 2010
Posts: 991
Default

I've added the '--standed=reverse' option promised above since then.
Simon Anders is offline   Reply With Quote
Old 06-29-2011, 01:46 AM   #27
marcowanger
Senior Member
 
Location: Hong Kong

Join Date: Dec 2008
Posts: 350
Default

Dear Simon, I just downloaded 0.5.1.p3 .

It seems there is no --stranded=reverse?


Quote:
Options:
-h, --help show this help message and exit
-m MODE, --mode=MODE mode to handle reads overlapping more than one
feature(choices: union, intersection-strict,
intersection-nonempty; default: union)
-s STRANDED, --stranded=STRANDED
whether the data is from a strand-specific assay
(default: yes)
is it still a hidden function?
Quote:
Originally Posted by Simon Anders View Post
I've added the '--standed=reverse' option promised above since then.
marcowanger is offline   Reply With Quote
Old 06-29-2011, 01:52 AM   #28
marcowanger
Senior Member
 
Location: Hong Kong

Join Date: Dec 2008
Posts: 350
Default

This is the error message I got from 0.5.1 p3

Quote:
htseq-count: error: option -s: invalid choice: 'reverse' (choose from 'yes', 'no')
marcowanger is offline   Reply With Quote
Old 06-29-2011, 05:53 AM   #29
Simon Anders
Senior Member
 
Location: Heidelberg, Germany

Join Date: Feb 2010
Posts: 991
Default

Sorry, it seems I forgot to upload the latest version. Please try with 0.5.3.
Simon Anders is offline   Reply With Quote
Old 08-21-2012, 09:41 AM   #30
billstevens
Senior Member
 
Location: Baltimore

Join Date: Mar 2012
Posts: 120
Default

I'm sorry, I just read this entire thread, and I still have no idea if my data is strand-specific or not. The core I send it to uses the old TruSeq RNA kit, http://epigenome.usc.edu/docs/resour...15008136_A.pdf

I was looking at my accepted_hits.bam file from Tophat to figure it out, but the second column doesn't have a 0 or a 16. Here is the actual output:

HWI-1KL118:23:C0J57ACXX:8:1308:11486:195428 pPR1s chr1 565039 3 100M = 565073 135 CCGTCATCTACTCTACCATCTTTGCAGGCACACTCATCACAGCGCTAAGCTCGCACTGATTTTTTACCTGAGTAGGCCTAGAAATAAACATGCTAGCTTT @CCDDFFFHHDFHIJJJIEHIJJJFHIGHIEIIHFGHBGIIGHGHGHGGHGJJJBGHJCHHEEEDCEECECCCCDCCCCCCDDDDDCDDDDDDCDDDACC NM:i:0 NH:i:2 CC:Z:chrM CP:i:4490 HI:i:0

Can anyone help?
billstevens is offline   Reply With Quote
Old 08-21-2012, 04:59 PM   #31
kmcarr
Senior Member
 
Location: USA, Midwest

Join Date: May 2008
Posts: 1,147
Default

Quote:
Originally Posted by billstevens View Post
I'm sorry, I just read this entire thread, and I still have no idea if my data is strand-specific or not. The core I send it to uses the old TruSeq RNA kit, http://epigenome.usc.edu/docs/resour...15008136_A.pdf
Bill,

The TruSeq RNA protocol is NOT directional. The appropriate --library-type option for these libraries is fr-unstranded (which is the default for TopHat).
kmcarr is offline   Reply With Quote
Old 08-21-2012, 05:07 PM   #32
billstevens
Senior Member
 
Location: Baltimore

Join Date: Mar 2012
Posts: 120
Default

Oh wow, thank you! So that means I needed to use the --stranded=no option using HTSeq? Correct?
billstevens is offline   Reply With Quote
Old 08-21-2012, 05:25 PM   #33
kmcarr
Senior Member
 
Location: USA, Midwest

Join Date: May 2008
Posts: 1,147
Default

Quote:
Originally Posted by billstevens View Post
Oh wow, thank you! So that means I needed to use the --stranded=no option using HTSeq? Correct?
That is correct.
kmcarr is offline   Reply With Quote
Old 09-05-2012, 08:23 PM   #34
carmeyeii
Senior Member
 
Location: Mexico

Join Date: Mar 2011
Posts: 137
Default

Hello!

I am analyzing a dataset which, from the Methods section, appears to be directional:

Quote:
RNA-seq
libraries were constructed using Illumina (San Diego, CA) mRNA sequencing kits. Total RNA was subjected to two rounds of oligo- dT purification and then chemically fragmented to approximately 200 bases. Fragmented RNA was used for first-strand cDNA synthesis using random primers and SuperScript II. The second strand was then synthesized using RNaseH and DNA Pol I.
It was generated on the GaxII.

After reading this post, I think it is safe to say that the reads on the fastq file correspond to the mRNA molecules that originated them, and therefore, to the coding strand of gene X in the genome as well. Is this correct?

After revising the library options for TopHat and Cufflinks, would you agree that the appropiate option for TopHat would be --library-type fr secondstrand? And that in Cufflinks I should also indicate that this is a second-stranded library?

Thanks very much!

Carmen
carmeyeii is offline   Reply With Quote
Old 09-05-2012, 09:30 PM   #35
amitm
Member
 
Location: Manchester, UK

Join Date: Feb 2011
Posts: 52
Default

Quote:
Originally Posted by carmeyeii View Post
After reading this post, I think it is safe to say that the reads on the fastq file correspond to the mRNA molecules that originated them, and therefore, to the coding strand of gene X in the genome as well. Is this correct?

After revising the library options for TopHat and Cufflinks, would you agree that the appropiate option for TopHat would be --library-type fr secondstrand? And that in Cufflinks I should also indicate that this is a second-stranded library?
hello carmen,
You are right about the first part.
Though the method "part" you have posted doesn't appear to be for a strand-specific RNA-Sequencing. If it is Strand-specific then the protocol used for generating the library is mentioned. Like whether its the dUTP method or the Illumina strand-specific protocol.
Look here for all such protocols - http://www.nature.com/nmeth/journal/...meth.1491.html

and check whether any of such is mentioned in the Methods or Supple Info.
amitm is offline   Reply With Quote
Old 09-06-2012, 07:57 PM   #36
carmeyeii
Senior Member
 
Location: Mexico

Join Date: Mar 2011
Posts: 137
Default

Thank you amitm.

After re-reading it is now clear that they did not use any strand-specific protocol.

Thanks for your help!

Carmen
carmeyeii is offline   Reply With Quote
Old 03-20-2013, 02:53 AM   #37
NicoBxl
not just another member
 
Location: Belgium

Join Date: Aug 2010
Posts: 263
Default

Little up for this interessting post.

I've a problem with my strand-specific data and htseq-count. So I aligned my data (2x50bp - dUTP method) with STAR. After that I extracted the reads with htseq-count :

htseq-count -s yes gtf.gtf data.sam > htseq.txt

But I've only a read count of 9 for the gene beside . And there is a lot of other genes with very low gene count.

With -s no, the read count seems ok.

Here are a read (and its pair) in sam format that are aligning on this gene (cf figure below)

Code:
HWI-ST1172:65:C0RN7ACXX:1:2316:4226:51105	99	chr15	44109457	255	51M	=	44109544	136	TGTAAACGCCGTAGCCGGGGGTCACTGGATGAATCCTCCTCCTGTTCCTCA	[email protected]	NH:i:1	HI:i:1	AS:i:98	nM:i:0
HWI-ST1172:65:C0RN7ACXX:1:2316:4226:51105	147	chr15	44109544	255	49M2S	=	44109457	-136	TGAAATTCTTCATCCTCCTCATCTGAGGACTCCATAGGGGCATAGTCTGCN	EJJJJJIJIJJIJJIIJJJJJJJJJJIGDIIJJJJJJJHHGHHFFDD=4+#	NH:i:1	HI:i:1	AS:i:98	nM:i:0
So do I have to put -s reverse ? but I don't understand in the gtf file, the gene is encoded on the minus strand and my reads are also aligning on the minus strand. I must miss something..

Thanks

N.

NicoBxl is offline   Reply With Quote
Old 03-20-2013, 05:17 AM   #38
Simon Anders
Senior Member
 
Location: Heidelberg, Germany

Join Date: Feb 2010
Posts: 991
Default

Well, let's parse your flag fields.

99d = 63h = 0110.0011b means: 1st mate, aligned to plus strand
147d = 93h = 1001.0011b means: 2nd mate, aligned to minus strand

There you have it. The first mate aligns to the strand opposite to the gene, so you need --stranded=reverse.
Simon Anders is offline   Reply With Quote
Old 03-20-2013, 05:22 AM   #39
NicoBxl
not just another member
 
Location: Belgium

Join Date: Aug 2010
Posts: 263
Default

But why IGV is coloring my reads in red (color alignment by > first-of-pair strand) ?
NicoBxl is offline   Reply With Quote
Old 03-20-2013, 05:28 AM   #40
Simon Anders
Senior Member
 
Location: Heidelberg, Germany

Join Date: Feb 2010
Posts: 991
Default

How do you know that red means plus and blue means minus? Maybe it's the other way round.

Also note that "first-of-pair" probably means that also the second read is coloured according to the orientation of the first mate.
Simon Anders is offline   Reply With Quote
Reply

Tags
cufflinks, directional rna-seq, illumina, rna-seq, tophat

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 02:07 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO