Hi all,
I'm currently looking at TopHat2 output and wonder if I will switch from Tophat 1 to Tophat 2. I like many of the new features of TopHat, but I still have some problems with the new pipeline.
My data is Truseq PE 2x 100 bp, ~ 140 million reads per Sample. It's rat data mapped against rn4. The RNA has top Quality (RIN score 10) and the slides were loaded below the max capacity, so the quality is very good. Almost no reads were filtered out past the sequencing.
My old approach using tophat 1:
This yields many mapped reads: ~ 130 million per sample.
Now I map again with Tophat2 (using bowtie 2 beta 5). This is the command:
This will only yield ~ 70 million reads. This is too little in my opinion, so I tried a less stringent approach. (I don't want to map too stringent to avoid a genotype dependent bias, since some of my samples a more closely related to the reference than others.)
This should allow more mismatches in the reads mapped. It should also kick out reads that map to more than 20 positions. This is similar to the TopHat1 approach, so it shouldn't matter too much.
Still, this only yields ~ 1 million more reads than using the standard 2 mismatches.
A closer look into the logs:
run.log:
it seems that segment mapping takes place only with left_kept reads, never with right_kept reads
Also the bowtie logs from right_kept_ are missing
From the left segment mapping, ~ 80% of them align.
The right reads seem to be aligned only once:
So I'm wondering if this is the cause for the low percentage of mapping reads.... still, from those reads that map, almost all are also properly paired.
There is some error in the tophat.log, but I don't think this was a problem:
I'm now trying to use bowtie 1 and also bowtie2 beta 6 and let you know if things improved.
If anyone has an idea what might be the problem, let me know.
Best,
Seb
I'm currently looking at TopHat2 output and wonder if I will switch from Tophat 1 to Tophat 2. I like many of the new features of TopHat, but I still have some problems with the new pipeline.
My data is Truseq PE 2x 100 bp, ~ 140 million reads per Sample. It's rat data mapped against rn4. The RNA has top Quality (RIN score 10) and the slides were loaded below the max capacity, so the quality is very good. Almost no reads were filtered out past the sequencing.
My old approach using tophat 1:
Code:
/usr/local/tophat-1.3.1/tophat -r 80 -p 4 --tmp-dir /tmp/Sample_1/ -o /Sample_1/ -G ./ensembl59_iGenomes.gtf rn4 reads1.fastq.gz reads2.fastq.gz \
Now I map again with Tophat2 (using bowtie 2 beta 5). This is the command:
Code:
../tophat-2.0.0.Linux_x86_64/tophat -r 80 -p 4 --tmp-dir /tmp/Sample_BN2/ -o /Sample_1/ -G ./ensembl59_iGenomes.gtf /bowtie2/rn4 reads1.fastq.gz reads2.fastq.gz
Code:
../tophat-2.0.0.Linux_x86_64/tophat -r 80 -p 4 --read-mismatches 6 -n 3 --fusion-search -M--tmp-dir /tmp/Sample_BN2/ -o /Sample_1/ -G ./ensembl59_iGenomes.gtf /bowtie2/rn4 reads1.fastq.gz reads2.fastq.gz
Still, this only yields ~ 1 million more reads than using the standard 2 mismatches.
A closer look into the logs:
run.log:
it seems that segment mapping takes place only with left_kept reads, never with right_kept reads
Code:
gzip -cd< /tmp/Sample_1//left_kept_reads.m2g_um_seg2.fq.z|/home/schaefer/bin/bowtie2-align -q -k 11 ....
Code:
ll | grep bowtie bowtie.BN1_ACAGTG_L003_R1_002.fixmap.log bowtie.BN1_ACAGTG_L003_R2_002.fixmap.log bowtie_build.log bowtie_inspect_recons.log bowtie.left_kept_reads.fixmap.log bowtie.left_kept_reads.m2g_um_seg1.fixmap.log bowtie.left_kept_reads.m2g_um_seg2.fixmap.log bowtie.left_kept_reads.m2g_um_seg3.fixmap.log bowtie.left_kept_reads.m2g_um_seg4.fixmap.log bowtie.right_kept_reads.fixmap.log bowtie.right_kept_reads.m2g_um_unmapped.fixmap.log
The right reads seem to be aligned only once:
Code:
cat bowtie.right_kept_reads.fixmap.log 73139621 reads; of these: 73139621 (100.00%) were unpaired; of these: 38776646 (53.02%) aligned 0 times 34362975 (46.98%) aligned exactly 1 time 0 (0.00%) aligned >1 times
There is some error in the tophat.log, but I don't think this was a problem:
Code:
[2012-05-08 17:39:45] Beginning TopHat run (v2.0.0) ----------------------------------------------- [2012-05-08 17:39:45] Checking for Bowtie Bowtie version: 2.0.0.5 [2012-05-08 17:39:45] Checking for Samtools Samtools version: 0.1.17.0 [2012-05-08 17:39:45] Checking for Bowtie index files [2012-05-08 17:39:46] Checking for reference FASTA file Warning: Could not find FASTA file /bowtie2/rn4.fa [2012-05-08 17:39:46] Reconstituting reference FASTA file from Bowtie index Executing: /home/schaefer/bin/bowtie2-inspect /bowtie2/rn4 > /tmp/Sample_1//rn4.fa [2012-05-08 17:44:08] Generating SAM header for ./annotation/bowtie2/rn4 format: fastq quality scale: phred33 (default) [2012-05-08 17:44:44] Reading known junctions from GTF file [2012-05-08 17:44:53] Pre-filtering multi-mapped left reads [2012-05-08 17:44:53] Mapping BN1_ACAGTG_L003_R1_002 against rn4 with Bowtie2 Parse error at line 921680: sequence and quality are inconsistent [2012-05-08 17:52:11] Pre-filtering multi-mapped right reads [2012-05-08 17:52:11] Mapping BN1_ACAGTG_L003_R2_002 against rn4 with Bowtie2 Parse error at line 359925: sequence and quality are inconsistent [2012-05-08 17:55:10] Preparing reads left reads: min. length=101, count=73192247 right reads: min. length=101, count=73139621 [2012-05-08 19:02:11] Creating transcriptome data files.. [2012-05-08 19:02:56] Building Bowtie index from ensembl59_iGenomes.fa [2012-05-08 19:10:49] Mapping left_kept_reads against transcriptome ensembl59_iGenomes with Bowtie2 [2012-05-08 21:05:21] Mapping right_kept_reads against transcriptome ensembl59_iGenomes with Bowtie2 [2012-05-08 23:00:12] Converting left_kept_reads.m2g to genomic coordinates (map2gtf) [2012-05-08 23:17:24] Converting right_kept_reads.m2g to genomic coordinates (map2gtf) [2012-05-08 23:34:31] Resuming TopHat pipeline with unmapped reads [2012-05-08 23:44:44] Retrieving sequences for splices [2012-05-08 23:47:59] Indexing splices [2012-05-08 23:50:34] Mapping left_kept_reads.m2g_um_unmapped against segment_juncs with Bowtie2 (1/1) [2012-05-08 23:50:45] Processing bowtie hits [2012-05-09 00:10:52] Mapping right_kept_reads.m2g_um_unmapped against segment_juncs with Bowtie2 (1/1) [2012-05-09 00:11:03] Processing bowtie hits [2012-05-09 00:28:57] Reporting output tracks ----------------------------------------------- [2012-05-09 02:14:18] Run complete: 08:34:33 elapsed
If anyone has an idea what might be the problem, let me know.
Best,
Seb
Comment