Seqanswers Leaderboard Ad

**maubp** · 05-30-2012, 05:29 AM

Originally posted by arabidopsis View Post

Also, when I display TopHat bam files in UCSC browser, I get left-hand reads colored blue and right-end reads-red. The same color scheme is used for single-end reads to show strand specificity. But paired-end reads in USCS browser have no visible strand indication. Or is it just that I do not see it?
...
Does anyone know how to make paired-end reads look strand specific?

I'm not familiar with the UCSC browser settings, but you could try another BAM viewer for a 'second opinion', e.g. IGV or Tablet. I've used Tablet to show paired end reads with their green/blue strand specific colour scheme.

**schönblick** · 05-30-2012, 05:37 AM

Originally posted by arabidopsis View Post

Dear All,

Bowtie can only map about 10% of the reads. However, TopHAt maps around 80%. Does anyone know why this happens?

Could you specify what you exactly did? because Tophat use Bowtie as mapper.

**arabidopsis** · 05-30-2012, 06:01 AM

Originally posted by schönblick View Post

Could you specify what you exactly did? because Tophat use Bowtie as mapper.

TopHat settings:

RNA-Seq FASTQ file 94: Clip on Read_LL
Conditional (refGenomeSource) 0
Select a reference genome /galaxy/data/hg18/bowtie_index/hg18
Conditional (singlePaired) 1
RNA-Seq FASTQ file 97: Clip on Read_RR
Mean Inner Distance between Mate Pairs 200
Conditional (pParams) 1
Library Type FR First Strand
Std. Dev for Distance between Mate Pairs 20
Anchor length (at least 3) 8
Maximum number of mismatches that can appear in the anchor region of spliced alignment 0
The minimum intron length 70
The maximum intron length 500000
Conditional (indel_search) 1
Max insertion length. 3
Max deletion length. 3
Maximum number of alignments to be allowed 20
Minimum intron length that may be found during split-segment (default) search 50
Maximum intron length that may be found during split-segment (default) search 500000
Number of mismatches allowed in the initial read mapping 2
Number of mismatches allowed in each segment alignment for reads mapped independently 2
Minimum length of read segments 25
Conditional (own_junctions) 1
Conditional (closure_search) 1
Conditional (coverage_search) 0
Minimum intron length that may be found during coverage search 50
Maximum intron length that may be found during coverage search 20000
Use Microexon Search No

Bowtie settings:
Conditional (refGenomeSource) 0
Select a reference genome /galaxy/data/hg18/bowtie_index/hg18
Conditional (singlePaired) 1
Forward FASTQ file 19: P0037_N038-02_CGATGT_L004_R1_001.fastq
Reverse FASTQ file 18: P0037_N038-02_CGATGT_L004_R2_001.fastq
Maximum insert size for valid paired-end alignments (-X) 1000
The upstream/downstream mate orientation for valid paired-end alignment against the forward reference strand (--fr/--rf/--ff) FR (for Illumina)
Conditional (pParams) 1
Skip the first n pairs (-s) 0
Only align the first n pairs (-u) -1
Trim n bases from high-quality (left) end of each read before alignment (-5) 5
Trim n bases from low-quality (right) end of each read before alignment (-3) 20
Maximum number of mismatches permitted in the seed (-n) 3
Maximum permitted total of quality values at mismatched read positions (-e) 70
Seed length (-l) 28
Whether or not to round to the nearest 10 and saturating at 30 (--nomaqround) Round to nearest 10
Number of mismatches for SOAP-like alignment policy (-v) -1
Minimum insert size for valid paired-end alignments (-I) 0
Maximum number of attempts Bowtie will make to match an alignment for one mate with an alignment for the opposite mate (--pairtries) 100
Choose whether or not to attempt to align the forward reference strand (--nofw) Align against the forward reference strand
Choose whether or not to align against the reverse-complement reference strand (--norc) Align against the reverse-complement reference strand
Whether or not to try as hard as possible to find valid alignments when they exist (-y) Do not try hard
Report up to n valid arguments per pair (-k) 1
Whether or not to report all valid alignments per pair (-a) Do not report all valid alignments
Suppress all alignments for a pair if more than n reportable alignments exist (-m) -1
Write all reads with a number of valid alignments exceeding the limit set with the -m option to a file (--max) False
Write all reads that could not be aligned to a file (--un) True
Conditional (pBestOption) 0
Maximum number of backtracks permitted when aligning a read (--maxbts) 125
Override the offrate of the index to n (-o) -1
Seed for pseudo-random number generator (--seed) -1
Suppress the header in the output SAM file False

I used both Bowtie and TopHat versions, available on Galaxy Genome web platform. And the fact that TopHat is based on Bowtie makes the matter even more confusing...

**kmcarr** · 05-30-2012, 07:02 AM

Originally posted by arabidopsis View Post

Dear All,

I have a question about paired-end Illumina Hiseq100 data. My fastq files have normal average quality, but Bowtie can only map about 10% of the reads. However, TopHAt maps around 80%. Does anyone know why this happens

Because using Bowtie to directly map RNA-Seq data to a genomic reference is not appropriate. RNA-Seq reads do not (generally) map contiguously to their reference genome. They have to be "split" to span the introns which have been spliced out. Bowtie won't map split reads; Tophat does. The results you describe are not surprising at all. You need to use the correct tool for the job.

**arabidopsis** · 05-30-2012, 07:12 AM

Originally posted by kmcarr View Post

Because using Bowtie to directly map RNA-Seq data to a genomic reference is not appropriate. RNA-Seq reads do not (generally) map contiguously to their reference genome. They have to be "split" to span the introns which have been spliced out. Bowtie won't map split reads; Tophat does. The results you describe are not surprising at all. You need to use the correct tool for the job.

kmcarr,

What you say is right, but when I map single-end reads with bowtie it works fine. Tophat gives only 10-15% more mapped reads. And with the current problem I also looked at the splice junction file, produced by tophat. It only contains about 10000 regions. This cannot account for 10-fold increase in mapped read number, can it?

**kmcarr** · 05-30-2012, 07:29 AM

Originally posted by arabidopsis View Post

kmcarr,

What you say is right, but when I map single-end reads with bowtie it works fine. Tophat gives only 10-15% more mapped reads. And with the current problem I also looked at the splice junction file, produced by tophat. It only contains about 10000 regions. This cannot account for 10-fold increase in mapped read number, can it?

When working with paired-end data bowtie considers the pair as a whole when determining the validity of an alignment. Part of that consideration is the relative distance between the two reads when aligned to the genome. The limits to be considered valid are set by the -X and -I options and in your example the maximum distance between paired reads was 1000bp. If the forward & reverse read map to different exons they could be separated by a much greater distance than this.

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 18 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 22 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 17 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 49 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

strand-specificity in paired-end data

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News