SEQanswers

Go Back   SEQanswers > Sequencing Technologies/Companies > Illumina/Solexa



Similar Threads
Thread Thread Starter Forum Replies Last Post
Paired end reads in Tophat mathew Bioinformatics 8 03-22-2012 05:57 AM
Tophat paired end parameters gokhulkrishnakilaru Bioinformatics 1 10-13-2011 05:56 PM
properly paired reads in TopHat output shurjo Bioinformatics 2 12-02-2010 12:35 PM
tophat with mixed paired end reads nimmi RNA Sequencing 2 11-10-2010 12:03 PM
Tuning TopHat parameters for SOLiD reads Pejman SOLiD 5 10-29-2010 08:45 AM

Reply
 
Thread Tools
Old 04-21-2013, 11:26 PM   #1
MichalO
Member
 
Location: CH

Join Date: Jan 2011
Posts: 10
Default truseq paired reads - what strandedness parameters in tophat?

In RNA-seq, paired ends, stranded, done with TrueSeq - what parameter in tophat should be applied to see strandedness properly?

fr-firststrand or fr-secondstrand ?

I am a bit confused, seen also the hints on reverse complement or swapping the order of R1 and R2 in the command line. (http://www.biostars.org/p/64250/)

Thanks for the hints!
MichalO is offline   Reply With Quote
Old 04-22-2013, 08:03 AM   #2
kmcarr
Senior Member
 
Location: USA, Midwest

Join Date: May 2008
Posts: 1,154
Default

There are now a handful of Library prep kits from Illumina with TruSeq in their name. Some are strand specific, some are not. The strand specific ones are fairly new. The only way to know for sure which one was used is to ask the person who created the sequencing library.

If they used the TruSeq Stranded kit the proper parameter is fr-firststrand.

If they used the earlier, unstranded TruSeq kit the proper parameter is fr-unstranded. This is the default setting for TopHat.

ETA: O.K. I just re-read your post more carefully and see that you did specifically ask about the TruSeq stranded protocol. As I said above the correct --library-type parameter is fr-firststrand.

Last edited by kmcarr; 04-22-2013 at 08:13 AM. Reason: Reread original post.
kmcarr is offline   Reply With Quote
Old 04-29-2013, 05:42 AM   #3
MichalO
Member
 
Location: CH

Join Date: Jan 2011
Posts: 10
Default

Thanks for the clear explanation!
I'm checking the fr-firstrand now
MichalO is offline   Reply With Quote
Old 12-08-2013, 02:49 PM   #4
danwiththeplan
Member
 
Location: Auckland

Join Date: Sep 2011
Posts: 72
Default Truseq stranded RNA: Tuxedo library parameters

I don't think this is correct.
From the TruSeq stranded RNA manual:

Quote:
The cleaved RNA fragments are copied into
first strand cDNA using reverse transcriptase and random primers, followed by second
strand cDNA synthesis using DNA Polymerase I and RNase H. These cDNA fragments
then have the addition of a single 'A' base and subsequent ligation of the adapter. The
products are purified and enriched with PCR to create the final cDNA library.
This implies to me that the second strand cDNA is what gets sequenced (as it has an A overhang and therefore gets the adaptor ligated. Tuxedo definitions of libraries say:

Quote:
fr-unstranded (default): Standard Illumina Reads from the left-most end of the fragment (in transcript coordinates) map to the transcript strand, and the right-most end maps to the opposite strand.
fr-firststrand: dUTP, NSR, NNSR Same as above except we enforce the rule that the right-most end of the fragment (in transcript coordinates) is the first sequenced (or only sequenced for single-end reads). Equivalently, it is assumed that only the strand generated during first strand synthesis is sequenced.
fr-secondstrand: Directional Illumina (Ligation), Standard SOLiD Same as above except we enforce the rule that the left-most end of the fragment (in transcript coordinates) is the first sequenced (or only sequenced for single-end reads). Equivalently, it is assumed that only the strand generated during second strand synthesis is sequenced.
To me this implies the correct parameter is fr-secondstrand
danwiththeplan is offline   Reply With Quote
Old 12-08-2013, 05:38 PM   #5
kmcarr
Senior Member
 
Location: USA, Midwest

Join Date: May 2008
Posts: 1,154
Default

Quote:
Originally Posted by danwiththeplan View Post
I don't think this is correct.
From the TruSeq stranded RNA manual:
Quote:
The cleaved RNA fragments are copied into
first strand cDNA using reverse transcriptase and random primers, followed by second
strand cDNA synthesis using DNA Polymerase I and RNase H. These cDNA fragments
then have the addition of a single 'A' base and subsequent ligation of the adapter. The
products are purified and enriched with PCR to create the final cDNA library.
This implies to me that the second strand cDNA is what gets sequenced (as it has an A overhang and therefore gets the adaptor ligated.
The adapters, like the cDNA are double stranded. Both strands of the cDNA are ligated.
Quote:
Tuxedo definitions of libraries say:
Quote:
fr-unstranded (default): Standard Illumina Reads from the left-most end of the fragment (in transcript coordinates) map to the transcript strand, and the right-most end maps to the opposite strand.

fr-firststrand: dUTP, NSR, NNSR Same as above except we enforce the rule that the right-most end of the fragment (in transcript coordinates) is the first sequenced (or only sequenced for single-end reads). Equivalently, it is assumed that only the strand generated during first strand synthesis is sequenced.

fr-secondstrand: Directional Illumina (Ligation), Standard SOLiD Same as above except we enforce the rule that the left-most end of the fragment (in transcript coordinates) is the first sequenced (or only sequenced for single-end reads). Equivalently, it is assumed that only the strand generated during second strand synthesis is sequenced.
To me this implies the correct parameter is fr-secondstrand
In the Tuxedo documentation the "Directional Illumina (Ligation)" protocol referred to is a very old protocol from Illumina for creating strand specific RNA-Seq libraries; it is no longer used. The TruSeq Stranded RNA-Seq kits use a fairly standard dUTP second-strand making protocol. This means, as the Tuxedo documentation states, that you should specify your library-type as "fr-firststrand"
kmcarr is offline   Reply With Quote
Old 12-08-2013, 05:39 PM   #6
danwiththeplan
Member
 
Location: Auckland

Join Date: Sep 2011
Posts: 72
Default

Alright thanks for clearing that up.
danwiththeplan is offline   Reply With Quote
Old 01-18-2014, 12:10 PM   #7
qserenali
Junior Member
 
Location: NJ, USA

Join Date: Apr 2013
Posts: 3
Default

For my RNA-Seq data, I subsequently learned that the data was generated using a unstranded protocol. Before I got this information, I used the following hints from TopHat manual to try to figure things out.

"I am not sure which library type to use (fr-firststrand or fr-secondstrand), what should I do?

One possible way to figure out the correct library-type is to run TopHat with a small subset of the reads (e.g., 1M) as follows.
1) run TopHat with fr-firststrand and count the number of junctions in junctions.bed (one of the output files from TopHat)
2) run TopHat with fr-secondstrand and count the number of junctions in junctions.bed
Since the splice junction finding algorithm of TopHat makes use of library-type information (if provided), one of the two TopHat runs would result in many more splice junctions than the other one. You can then use the library type that gives more junctions. If this is not the case TopHat might not work well with your sequencing protocol. Please let us know more details about your protocol so we can add support for new library types."

> more fr-firststrand/junctions.bed |wc -l
151757
> more fr-secondstrand/junctions.bed |wc -l
151901
> more fr-unstranded/junctions.bed |wc -l
157557

Indeed setting fr-unstranded option gave the most junctions. I however wonder whether this ~20% extra junction is in line with "many more splice junctions" expected when setting the correct --library-type option?
qserenali is offline   Reply With Quote
Old 03-20-2014, 08:51 AM   #8
dbrg77
Junior Member
 
Location: Manchester

Join Date: Sep 2010
Posts: 6
Default

check this analysis guide from Illumina, it should be
fr-unstranded for Trueseq regular
fr-firststrand for Trueseq stranded

http://res.illumina.com/documents/pr...ysistophat.pdf
dbrg77 is offline   Reply With Quote
Old 02-03-2015, 05:10 AM   #9
TomHarrop
Member
 
Location: New Zealand

Join Date: Jul 2014
Posts: 20
Default Help understanding the --library-type paramater for TruSeq Stranded llibraries

Sorry to drag this thread to the top again but I'm not sure I understand why the fr-firststrand parameter is recommended for mapping Illumina TruSeq Stranded libraries.

From my understanding of the protocol I'm looking at, library generation proceeds as follows:

I. single-strand cDNA is synthesised from template RNA using RT. This will of course be reverse-complementary to the RNA.

II. a second cDNA strand is synthesised with the inclusion of dUTP instead of dTTP, thus the cDNA strand with the same orientation as the template RNA is tagged with dUTP.

III. During PCR enrichment of the library, a polymerase that cannot amplify through dUTP is used, thus only the 'first', non-dUTP cDNA strand (which is reverse-complementary to the original RNA) is available as a template.

From that I would have thought the correct choice would be fr-secondstrand. To test this, I took ~2 million reads from a library made with this protocol and performed the following tests with Tophat and htseq-count.

First, I mapped the reads using all four possible combinations of the --library-type parameter and the order of reads:

Code:
tophat -p 4 -o $OUTDIR.fs --mate-inner-dist 131 \
	--mate-std-dev 61 --max-intron-length 5000 \
	--library-type fr-firststrand --no-mixed \
	--transcriptome-index $INDEX $GENOME \
	$READS1 $READS2 

tophat -p 4 -o $OUTDIR.ss --mate-inner-dist 131 \
	--mate-std-dev 61 --max-intron-length 5000 \
	--library-type fr-secondstrand --no-mixed \
	--transcriptome-index $INDEX $GENOME \
	$READS1 $READS2 

tophat -p 4 -o $OUTDIR.fsrev --mate-inner-dist 131 \
	--mate-std-dev 61 --max-intron-length 5000 \
	--library-type fr-firststrand --no-mixed \
	--transcriptome-index $INDEX $GENOME \
	$READS2 $READS1

tophat -p 4 -o $OUTDIR.ssrev --mate-inner-dist 131 \
	--mate-std-dev 61 --max-intron-length 5000 \
	--library-type fr-secondstrand --no-mixed \
	--transcriptome-index $INDEX $GENOME \
	$READS2 $READS1
Looking at the junctions.bed files:

Code:
$ wc -l */junctions.bed
   85334 OUTDIR.fs/junctions.bed
   73254 OUTDIR.fsrev/junctions.bed
   73308 OUTDIR.ss/junctions.bed
   85334 OUTDIR.ssrev/junctions.bed
I get the most junctions using fr-firststrand READS1 READS2 or fr-secondstrand READS2 READS1. OK then... next I ran htseq-count on all four accepted_hits.bam files, and counted the number of reads mapping within genes for each of the above variations:

Code:
     FS   FSREV      SS   SSREV 
  42675 1219998   42117 1231029
So from this, it would appear that using fr-secondstrand and supplying the reads to tophat in reverse order is the way to go. But I can't really work out why this would be the case! Can anyone point out what I'm missing here?
TomHarrop is offline   Reply With Quote
Old 02-06-2015, 08:37 AM   #10
lm003
Junior Member
 
Location: New York

Join Date: Feb 2015
Posts: 3
Default

HI TomHarrop,
I am new to RNAseq and running into the same problem now. I aligned my reads using fr-firststrand READS1 READS2 with tophat2 and then did HTseq count and a lot of my reads are not counted (they end up in the __no_feature)
did you ever realize why when you use fr-secondstrand READS2 READS1 you get more reads counted?
I used the trueseq stranded polyA illumina kit for paired end seq
thanks

Quote:
Originally Posted by TomHarrop View Post
Sorry to drag this thread to the top again but I'm not sure I understand why the fr-firststrand parameter is recommended for mapping Illumina TruSeq Stranded libraries.

From my understanding of the protocol I'm looking at, library generation proceeds as follows:

I. single-strand cDNA is synthesised from template RNA using RT. This will of course be reverse-complementary to the RNA.

II. a second cDNA strand is synthesised with the inclusion of dUTP instead of dTTP, thus the cDNA strand with the same orientation as the template RNA is tagged with dUTP.

III. During PCR enrichment of the library, a polymerase that cannot amplify through dUTP is used, thus only the 'first', non-dUTP cDNA strand (which is reverse-complementary to the original RNA) is available as a template.

From that I would have thought the correct choice would be fr-secondstrand. To test this, I took ~2 million reads from a library made with this protocol and performed the following tests with Tophat and htseq-count.

First, I mapped the reads using all four possible combinations of the --library-type parameter and the order of reads:

Code:
tophat -p 4 -o $OUTDIR.fs --mate-inner-dist 131 \
	--mate-std-dev 61 --max-intron-length 5000 \
	--library-type fr-firststrand --no-mixed \
	--transcriptome-index $INDEX $GENOME \
	$READS1 $READS2 

tophat -p 4 -o $OUTDIR.ss --mate-inner-dist 131 \
	--mate-std-dev 61 --max-intron-length 5000 \
	--library-type fr-secondstrand --no-mixed \
	--transcriptome-index $INDEX $GENOME \
	$READS1 $READS2 

tophat -p 4 -o $OUTDIR.fsrev --mate-inner-dist 131 \
	--mate-std-dev 61 --max-intron-length 5000 \
	--library-type fr-firststrand --no-mixed \
	--transcriptome-index $INDEX $GENOME \
	$READS2 $READS1

tophat -p 4 -o $OUTDIR.ssrev --mate-inner-dist 131 \
	--mate-std-dev 61 --max-intron-length 5000 \
	--library-type fr-secondstrand --no-mixed \
	--transcriptome-index $INDEX $GENOME \
	$READS2 $READS1
Looking at the junctions.bed files:

Code:
$ wc -l */junctions.bed
   85334 OUTDIR.fs/junctions.bed
   73254 OUTDIR.fsrev/junctions.bed
   73308 OUTDIR.ss/junctions.bed
   85334 OUTDIR.ssrev/junctions.bed
I get the most junctions using fr-firststrand READS1 READS2 or fr-secondstrand READS2 READS1. OK then... next I ran htseq-count on all four accepted_hits.bam files, and counted the number of reads mapping within genes for each of the above variations:

Code:
     FS   FSREV      SS   SSREV 
  42675 1219998   42117 1231029
So from this, it would appear that using fr-secondstrand and supplying the reads to tophat in reverse order is the way to go. But I can't really work out why this would be the case! Can anyone point out what I'm missing here?
lm003 is offline   Reply With Quote
Old 02-06-2015, 08:43 AM   #11
lm003
Junior Member
 
Location: New York

Join Date: Feb 2015
Posts: 3
Default

TomHarrop which options did you use when you did htseq-count?
--stranded=yes
--stranded=reverse
not sure it if matters
lm003 is offline   Reply With Quote
Old 02-06-2015, 11:21 AM   #12
kmcarr
Senior Member
 
Location: USA, Midwest

Join Date: May 2008
Posts: 1,154
Default

Quote:
Originally Posted by lm003 View Post
TomHarrop which options did you use when you did htseq-count?
--stranded=yes
--stranded=reverse
not sure it if matters
Yes, it matters a great deal. For TruSeq Stranded libraries it is --fr-firstrand for TopHat and --stranded=reverse for htseq-count. What these are both saying is that the read #1 matches the cDNA first-strand orientation, which is the reverse complement of the mRNA.
kmcarr is offline   Reply With Quote
Old 02-06-2015, 11:35 AM   #13
lm003
Junior Member
 
Location: New York

Join Date: Feb 2015
Posts: 3
Default

Thank you!

I actually just compared the two , here are the outcomes :
I am convinced --stranded=reverse is the right option for paired end first-stranded. (using Truseq illumina kit)

htseq-count --stranded=reverse file.sam file.genes.gtf

no_feature 194094
ambiguous 6540
too_low_aQual 0
not_aligned 0
alignment_not_unique 0


htseq-count --stranded=yes file.sam file.genes.gtf

no_feature 2223636
ambiguous 200
too_low_aQual 0
not_aligned 0
alignment_not_unique 0


Quote:
Originally Posted by kmcarr View Post
Yes, it matters a great deal. For TruSeq Stranded libraries it is --fr-firstrand for TopHat and --stranded=reverse for htseq-count. What these are both saying is that the read #1 matches the cDNA first-strand orientation, which is the reverse complement of the mRNA.
lm003 is offline   Reply With Quote
Old 02-06-2015, 11:47 AM   #14
deepseq
Junior Member
 
Location: california

Join Date: Apr 2011
Posts: 2
Question fastqc error (How do I post a new thread??)

$ fastqc abc.fastq
Output:
Started analysis of abc.fastq

Failed to process file abc.fastq
uk.ac.babraham.FastQC.Sequence.SequenceFormatException: ID line didn't start with '@'
at uk.ac.babraham.FastQC.Sequence.FastQFile.readNext(FastQFile.java:134)
at uk.ac.babraham.FastQC.Sequence.FastQFile.next(FastQFile.java:105)
at uk.ac.babraham.FastQC.Analysis.AnalysisRunner.run(AnalysisRunner.java:76)
at java.lang.Thread.run(Thread.java:724)

So, I looked at the file and it has @ in the first line and at the beginning of the reads....

$ head abc.fastq
@HWI-ST560:173:C5WEGACXX:2:1101:1907:2424 1:N:0:
ACGACGGTCTAAACCCTNGANNTCTCGGGNNNNNNNNNNNNNNNAAGAGCGNNNNNNNNGNNNTGCCGAGACCGATCTCGTATGCCGTCTTCT
+
[email protected]#08##00?FHGI################################################################
@HWI-ST560:173:C5WEGACXX:2:1101:1860:2478 1:N:0:
AGCGCTCCGCCAGGGCCGTGGGCCGACCCCGGNNNNNNNNNTNNGAGGGCCTCACTAAACC
+
FDBFHHIHIJJJJJIJIJFHIJBHBGGFFFBB#########,##,++<@-828>+:>>A>?
@HWI-ST560:173:C5WEGACXX:2:1101:2019:2431 1:N:0:
AAACCCCCGGGACGGGGNCCNGCGGGGCANNNNNNNNNNNNNNNGGGGGGGNNNNNNNNGNNNGTTTTCGGGGGGCCAGGGGAAGGGAGAAGG

What do I do to fix this? Sorry for posting here, but I don't see a tab to post a new thread somehow...
deepseq is offline   Reply With Quote
Old 02-09-2015, 01:23 AM   #15
TomHarrop
Member
 
Location: New Zealand

Join Date: Jul 2014
Posts: 20
Default

Quote:
Originally Posted by lm003 View Post
TomHarrop which options did you use when you did htseq-count?
--stranded=yes
--stranded=reverse
not sure it if matters
Ah, this looks like it was exactly what I was missing. I was using --stranded=yes. Thanks for catching that and kmcarr for confirming. I switched to --stranded=reverse and now I have:
Code:
     FS   FSREV      SS   SSREV 
1231029   42109 1220563   42675

Last edited by TomHarrop; 02-09-2015 at 01:51 AM. Reason: updated
TomHarrop is offline   Reply With Quote
Reply

Tags
paired, stranded, strandedness, tophat, truseq

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 10:38 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO