Low mapping percentage with TopHat2

DerSeb

Member

Join Date: Oct 2009
Posts: 44

Low mapping percentage with TopHat2

05-09-2012, 04:47 AM

Hi all,

I'm currently looking at TopHat2 output and wonder if I will switch from Tophat 1 to Tophat 2. I like many of the new features of TopHat, but I still have some problems with the new pipeline.

My data is Truseq PE 2x 100 bp, ~ 140 million reads per Sample. It's rat data mapped against rn4. The RNA has top Quality (RIN score 10) and the slides were loaded below the max capacity, so the quality is very good. Almost no reads were filtered out past the sequencing.

My old approach using tophat 1:

Code:

/usr/local/tophat-1.3.1/tophat -r 80 -p 4 --tmp-dir /tmp/Sample_1/ -o /Sample_1/ -G ./ensembl59_iGenomes.gtf rn4 reads1.fastq.gz reads2.fastq.gz \

This yields many mapped reads: ~ 130 million per sample.

Now I map again with Tophat2 (using bowtie 2 beta 5). This is the command:

Code:

../tophat-2.0.0.Linux_x86_64/tophat -r 80 -p 4 --tmp-dir /tmp/Sample_BN2/ -o /Sample_1/ -G ./ensembl59_iGenomes.gtf /bowtie2/rn4 reads1.fastq.gz reads2.fastq.gz

This will only yield ~ 70 million reads. This is too little in my opinion, so I tried a less stringent approach. (I don't want to map too stringent to avoid a genotype dependent bias, since some of my samples a more closely related to the reference than others.)

Code:

../tophat-2.0.0.Linux_x86_64/tophat -r 80 -p 4 --read-mismatches 6 -n 3 --fusion-search -M--tmp-dir /tmp/Sample_BN2/ -o /Sample_1/ -G ./ensembl59_iGenomes.gtf /bowtie2/rn4 reads1.fastq.gz reads2.fastq.gz

This should allow more mismatches in the reads mapped. It should also kick out reads that map to more than 20 positions. This is similar to the TopHat1 approach, so it shouldn't matter too much.

Still, this only yields ~ 1 million more reads than using the standard 2 mismatches.

A closer look into the logs:

run.log:
it seems that segment mapping takes place only with left_kept reads, never with right_kept reads

Code:

gzip -cd< /tmp/Sample_1//left_kept_reads.m2g_um_seg2.fq.z|/home/schaefer/bin/bowtie2-align -q -k 11 ....

Also the bowtie logs from right_kept_ are missing

Code:

ll | grep bowtie
bowtie.BN1_ACAGTG_L003_R1_002.fixmap.log
bowtie.BN1_ACAGTG_L003_R2_002.fixmap.log
bowtie_build.log
bowtie_inspect_recons.log
bowtie.left_kept_reads.fixmap.log
bowtie.left_kept_reads.m2g_um_seg1.fixmap.log
bowtie.left_kept_reads.m2g_um_seg2.fixmap.log
bowtie.left_kept_reads.m2g_um_seg3.fixmap.log
bowtie.left_kept_reads.m2g_um_seg4.fixmap.log
bowtie.right_kept_reads.fixmap.log
bowtie.right_kept_reads.m2g_um_unmapped.fixmap.log

From the left segment mapping, ~ 80% of them align.

The right reads seem to be aligned only once:

Code:

cat bowtie.right_kept_reads.fixmap.log
73139621 reads; of these:
  73139621 (100.00%) were unpaired; of these:
    38776646 (53.02%) aligned 0 times
    34362975 (46.98%) aligned exactly 1 time
    0 (0.00%) aligned >1 times

So I'm wondering if this is the cause for the low percentage of mapping reads.... still, from those reads that map, almost all are also properly paired.

There is some error in the tophat.log, but I don't think this was a problem:

Code:

[2012-05-08 17:39:45] Beginning TopHat run (v2.0.0)
-----------------------------------------------
[2012-05-08 17:39:45] Checking for Bowtie
		  Bowtie version:	 2.0.0.5
[2012-05-08 17:39:45] Checking for Samtools
		Samtools version:	 0.1.17.0
[2012-05-08 17:39:45] Checking for Bowtie index files
[2012-05-08 17:39:46] Checking for reference FASTA file
	Warning: Could not find FASTA file /bowtie2/rn4.fa
[2012-05-08 17:39:46] Reconstituting reference FASTA file from Bowtie index
  Executing: /home/schaefer/bin/bowtie2-inspect /bowtie2/rn4 > /tmp/Sample_1//rn4.fa
[2012-05-08 17:44:08] Generating SAM header for ./annotation/bowtie2/rn4
	format:		 fastq
	quality scale:	 phred33 (default)
[2012-05-08 17:44:44] Reading known junctions from GTF file
[2012-05-08 17:44:53] Pre-filtering multi-mapped left reads
[2012-05-08 17:44:53] Mapping BN1_ACAGTG_L003_R1_002 against rn4 with Bowtie2 
Parse error at line 921680: sequence and quality are inconsistent
[2012-05-08 17:52:11] Pre-filtering multi-mapped right reads
[2012-05-08 17:52:11] Mapping BN1_ACAGTG_L003_R2_002 against rn4 with Bowtie2 
Parse error at line 359925: sequence and quality are inconsistent
[2012-05-08 17:55:10] Preparing reads
	 left reads: min. length=101, count=73192247
	right reads: min. length=101, count=73139621
[2012-05-08 19:02:11] Creating transcriptome data files..
[2012-05-08 19:02:56] Building Bowtie index from ensembl59_iGenomes.fa
[2012-05-08 19:10:49] Mapping left_kept_reads against transcriptome ensembl59_iGenomes with Bowtie2 
[2012-05-08 21:05:21] Mapping right_kept_reads against transcriptome ensembl59_iGenomes with Bowtie2 
[2012-05-08 23:00:12] Converting left_kept_reads.m2g to genomic coordinates (map2gtf)
[2012-05-08 23:17:24] Converting right_kept_reads.m2g to genomic coordinates (map2gtf)
[2012-05-08 23:34:31] Resuming TopHat pipeline with unmapped reads
[2012-05-08 23:44:44] Retrieving sequences for splices
[2012-05-08 23:47:59] Indexing splices
[2012-05-08 23:50:34] Mapping left_kept_reads.m2g_um_unmapped against segment_juncs with Bowtie2 (1/1)
[2012-05-08 23:50:45] Processing bowtie hits
[2012-05-09 00:10:52] Mapping right_kept_reads.m2g_um_unmapped against segment_juncs with Bowtie2 (1/1)
[2012-05-09 00:11:03] Processing bowtie hits
[2012-05-09 00:28:57] Reporting output tracks
-----------------------------------------------
[2012-05-09 02:14:18] Run complete: 08:34:33 elapsed

I'm now trying to use bowtie 1 and also bowtie2 beta 6 and let you know if things improved.

If anyone has an idea what might be the problem, let me know.

Best,
Seb

Tags: None

DerSeb

Member

Join Date: Oct 2009

Posts: 44
- Share
- Tweet
#2

05-15-2012, 12:31 AM

Just a brief update: This errors were due to bowtie2 beta5. Using beta6, everything works as expected.
Comment
bharati

Member

Join Date: Mar 2012

Posts: 38
- Share
- Tweet
#3

06-05-2012, 04:21 AM

does indexing of bowtie2 beta5 and beta6 differ from each other or are same?
Comment
bharati

Member

Join Date: Mar 2012

Posts: 38
- Share
- Tweet
#4

06-05-2012, 05:35 AM

After preparing reads by tophat v2.0.0 (Bowtie2.0.0.6) why does the no of left_reads and right_reads vary?

pls reply
Comment

Previous template Next

Essential Discoveries and Tools in Epitranscriptomics

by seqadmin

The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
- Channel: Articles
04-22-2024, 07:01 AM

Topics	Statistics	Last Post
Enhanced Neoantigen Detection: Introducing NeoHunter by seqadmin Started by seqadmin, Today, 07:17 AM	0 responses 2 views 0 likes	Last Post by seqadmin Today, 07:17 AM
A Close Examination at Probiotic-Related Bacteremia by seqadmin Started by seqadmin, 05-02-2024, 08:06 AM	0 responses 19 views 0 likes	Last Post by seqadmin 05-02-2024, 08:06 AM
Expanded Genetic Insights into Blood Pressure Regulation by seqadmin Started by seqadmin, 04-30-2024, 12:17 PM	0 responses 20 views 0 likes	Last Post by seqadmin 04-30-2024, 12:17 PM
The Role of Enhancers in Defining Cell Fate by seqadmin Started by seqadmin, 04-29-2024, 10:49 AM	0 responses 28 views 0 likes	Last Post by seqadmin 04-29-2024, 10:49 AM

Seqanswers Leaderboard Ad

Announcement

Low mapping percentage with TopHat2

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News