SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
how to determine strand from tophat output for paired-end RNA-seq data jay2008 Bioinformatics 1 05-30-2012 04:46 AM
Strand specificity Marcel RNA Sequencing 0 09-22-2011 06:13 AM
Sam flags for bwa-aligned paired end reads with identical + / - strand coordinates spark Bioinformatics 0 03-09-2011 04:00 AM
How to infer Illumina paired-end strand specificity from SAM output? David Harmin Bioinformatics 0 02-16-2011 08:34 AM
strand specificity sequser09 Sample Prep / Library Generation 0 11-22-2009 08:42 AM

Reply
 
Thread Tools
Old 05-30-2012, 04:07 AM   #1
arabidopsis
Member
 
Location: Denmark

Join Date: Oct 2010
Posts: 13
Default strand-specificity in paired-end data

Dear All,

I have a question about paired-end Illumina Hiseq100 data. My fastq files have normal average quality, but Bowtie can only map about 10% of the reads. However, TopHAt maps around 80%. Does anyone know why this happens?

Also, when I display TopHat bam files in UCSC browser, I get left-hand reads colored blue and right-end reads-red. The same color scheme is used for single-end reads to show strand specificity. But paired-end reads in USCS browser have no visible strand indication. Or is it just that I do not see it?
When reads are assembled into transcripts by Cufflinks, each transcript is annotated to either "plus" or "minus" strand and it generally corresponds right with reference annotation. But some transcripts get "antisense" class codes (x) and I would like to check how many reads actually build them.
Does anyone know how to make paired-end reads look strand specific?
arabidopsis is offline   Reply With Quote
Old 05-30-2012, 05:29 AM   #2
maubp
Peter (Biopython etc)
 
Location: Dundee, Scotland, UK

Join Date: Jul 2009
Posts: 1,539
Default

Quote:
Originally Posted by arabidopsis View Post
Also, when I display TopHat bam files in UCSC browser, I get left-hand reads colored blue and right-end reads-red. The same color scheme is used for single-end reads to show strand specificity. But paired-end reads in USCS browser have no visible strand indication. Or is it just that I do not see it?
...
Does anyone know how to make paired-end reads look strand specific?
I'm not familiar with the UCSC browser settings, but you could try another BAM viewer for a 'second opinion', e.g. IGV or Tablet. I've used Tablet to show paired end reads with their green/blue strand specific colour scheme.
maubp is offline   Reply With Quote
Old 05-30-2012, 05:37 AM   #3
schönblick
Junior Member
 
Location: Germany

Join Date: Nov 2008
Posts: 5
Default

Quote:
Originally Posted by arabidopsis View Post
Dear All,

Bowtie can only map about 10% of the reads. However, TopHAt maps around 80%. Does anyone know why this happens?
Could you specify what you exactly did? because Tophat use Bowtie as mapper.
schönblick is offline   Reply With Quote
Old 05-30-2012, 06:01 AM   #4
arabidopsis
Member
 
Location: Denmark

Join Date: Oct 2010
Posts: 13
Default

Quote:
Originally Posted by schönblick View Post
Could you specify what you exactly did? because Tophat use Bowtie as mapper.
TopHat settings:

RNA-Seq FASTQ file 94: Clip on Read_LL
Conditional (refGenomeSource) 0
Select a reference genome /galaxy/data/hg18/bowtie_index/hg18
Conditional (singlePaired) 1
RNA-Seq FASTQ file 97: Clip on Read_RR
Mean Inner Distance between Mate Pairs 200
Conditional (pParams) 1
Library Type FR First Strand
Std. Dev for Distance between Mate Pairs 20
Anchor length (at least 3) 8
Maximum number of mismatches that can appear in the anchor region of spliced alignment 0
The minimum intron length 70
The maximum intron length 500000
Conditional (indel_search) 1
Max insertion length. 3
Max deletion length. 3
Maximum number of alignments to be allowed 20
Minimum intron length that may be found during split-segment (default) search 50
Maximum intron length that may be found during split-segment (default) search 500000
Number of mismatches allowed in the initial read mapping 2
Number of mismatches allowed in each segment alignment for reads mapped independently 2
Minimum length of read segments 25
Conditional (own_junctions) 1
Conditional (closure_search) 1
Conditional (coverage_search) 0
Minimum intron length that may be found during coverage search 50
Maximum intron length that may be found during coverage search 20000
Use Microexon Search No

Bowtie settings:
Conditional (refGenomeSource) 0
Select a reference genome /galaxy/data/hg18/bowtie_index/hg18
Conditional (singlePaired) 1
Forward FASTQ file 19: P0037_N038-02_CGATGT_L004_R1_001.fastq
Reverse FASTQ file 18: P0037_N038-02_CGATGT_L004_R2_001.fastq
Maximum insert size for valid paired-end alignments (-X) 1000
The upstream/downstream mate orientation for valid paired-end alignment against the forward reference strand (--fr/--rf/--ff) FR (for Illumina)
Conditional (pParams) 1
Skip the first n pairs (-s) 0
Only align the first n pairs (-u) -1
Trim n bases from high-quality (left) end of each read before alignment (-5) 5
Trim n bases from low-quality (right) end of each read before alignment (-3) 20
Maximum number of mismatches permitted in the seed (-n) 3
Maximum permitted total of quality values at mismatched read positions (-e) 70
Seed length (-l) 28
Whether or not to round to the nearest 10 and saturating at 30 (--nomaqround) Round to nearest 10
Number of mismatches for SOAP-like alignment policy (-v) -1
Minimum insert size for valid paired-end alignments (-I) 0
Maximum number of attempts Bowtie will make to match an alignment for one mate with an alignment for the opposite mate (--pairtries) 100
Choose whether or not to attempt to align the forward reference strand (--nofw) Align against the forward reference strand
Choose whether or not to align against the reverse-complement reference strand (--norc) Align against the reverse-complement reference strand
Whether or not to try as hard as possible to find valid alignments when they exist (-y) Do not try hard
Report up to n valid arguments per pair (-k) 1
Whether or not to report all valid alignments per pair (-a) Do not report all valid alignments
Suppress all alignments for a pair if more than n reportable alignments exist (-m) -1
Write all reads with a number of valid alignments exceeding the limit set with the -m option to a file (--max) False
Write all reads that could not be aligned to a file (--un) True
Conditional (pBestOption) 0
Maximum number of backtracks permitted when aligning a read (--maxbts) 125
Override the offrate of the index to n (-o) -1
Seed for pseudo-random number generator (--seed) -1
Suppress the header in the output SAM file False

I used both Bowtie and TopHat versions, available on Galaxy Genome web platform. And the fact that TopHat is based on Bowtie makes the matter even more confusing...
arabidopsis is offline   Reply With Quote
Old 05-30-2012, 07:02 AM   #5
kmcarr
Senior Member
 
Location: USA, Midwest

Join Date: May 2008
Posts: 1,131
Default

Quote:
Originally Posted by arabidopsis View Post
Dear All,

I have a question about paired-end Illumina Hiseq100 data. My fastq files have normal average quality, but Bowtie can only map about 10% of the reads. However, TopHAt maps around 80%. Does anyone know why this happens
Because using Bowtie to directly map RNA-Seq data to a genomic reference is not appropriate. RNA-Seq reads do not (generally) map contiguously to their reference genome. They have to be "split" to span the introns which have been spliced out. Bowtie won't map split reads; Tophat does. The results you describe are not surprising at all. You need to use the correct tool for the job.
kmcarr is offline   Reply With Quote
Old 05-30-2012, 07:12 AM   #6
arabidopsis
Member
 
Location: Denmark

Join Date: Oct 2010
Posts: 13
Default

Quote:
Originally Posted by kmcarr View Post
Because using Bowtie to directly map RNA-Seq data to a genomic reference is not appropriate. RNA-Seq reads do not (generally) map contiguously to their reference genome. They have to be "split" to span the introns which have been spliced out. Bowtie won't map split reads; Tophat does. The results you describe are not surprising at all. You need to use the correct tool for the job.
kmcarr,

What you say is right, but when I map single-end reads with bowtie it works fine. Tophat gives only 10-15% more mapped reads. And with the current problem I also looked at the splice junction file, produced by tophat. It only contains about 10000 regions. This cannot account for 10-fold increase in mapped read number, can it?
arabidopsis is offline   Reply With Quote
Old 05-30-2012, 07:29 AM   #7
kmcarr
Senior Member
 
Location: USA, Midwest

Join Date: May 2008
Posts: 1,131
Default

Quote:
Originally Posted by arabidopsis View Post
kmcarr,

What you say is right, but when I map single-end reads with bowtie it works fine. Tophat gives only 10-15% more mapped reads. And with the current problem I also looked at the splice junction file, produced by tophat. It only contains about 10000 regions. This cannot account for 10-fold increase in mapped read number, can it?
When working with paired-end data bowtie considers the pair as a whole when determining the validity of an alignment. Part of that consideration is the relative distance between the two reads when aligned to the genome. The limits to be considered valid are set by the -X and -I options and in your example the maximum distance between paired reads was 1000bp. If the forward & reverse read map to different exons they could be separated by a much greater distance than this.
kmcarr is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 09:08 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2017, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO