SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Mapping species name back to taxonomy database to get higher taxonomic classification oliviaxinw Bioinformatics 2 10-27-2015 07:56 AM
Mapping to related species lorendarith De novo discovery 3 04-06-2013 05:32 PM
Galaxy Tophat mapping problem: illumina paired end RNA data seq alam Bioinformatics 0 01-14-2013 06:40 AM
Combined mapping of RNA-Seq reads originating from multiple species schelhorn RNA Sequencing 7 11-05-2010 08:55 AM

Reply
 
Thread Tools
Old 02-07-2013, 02:47 AM   #1
nr23
Member
 
Location: Ireland

Join Date: Oct 2012
Posts: 42
Default Same species mapping problem with Tophat

Hi,

I'm working with Illumina PE (100bp) RNA-Seq reads from Xenopus laevis. I've previously had a lot of trouble (very low % reads mapped, and almost 0% reads 'properly paired') mapping with bowtie/tophat to the X.tropicalis genome and, assuming that this was due to mismatches (and being unable to overcome this by the limit of N-3 mismatches per segment in bowtie), I switched to using STAMPY (http://www.well.ox.ac.uk/project-stampy), which allows multiple mismatches, and achieved very good results.

Recently the X.laevis genome has been released - I've tried re-mapping my reads using tophat/bowtie, but still get the same results (<20% reads mapping and ver low fraction 'properly paired'). This is really confusing, I would expect the occasional mismatch due to allelic differences, but should still see almost all of my reads mapping.

In addition, on inspecting the bowtie log files, I can see that ~75% of both left and right reads map. The trouble seems to be with the way tophat interprets the alignment produced by bowtie, as tophat seems to include a very small fraction (6M reads / ~ 90M) and reports 100% mapped for these reads in samtools flagstat.

I'll paste some of the stats I'm seeing below:

Log file from bowtie run (X.laevis reads vs X.laevis genome):

logs> more bowtie.left_kept_reads.fixmap.log
# reads processed: 31151246
# reads with at least one reported alignment: 22576653 (72.47%)
# reads that failed to align: 8249899 (26.48%)
# reads with alignments suppressed due to -m: 324694 (1.04%)

logs> more bowtie.right_kept_reads.fixmap.log
# reads processed: 33478582
# reads with at least one reported alignment: 24249964 (72.43%)
# reads that failed to align: 8880054 (26.52%)
# reads with alignments suppressed due to -m: 348564 (1.04%)
Reported 30987873 alignments to 1 output stream(s)


Samtools flagstat on same tophat run:

> samtools flagstat accepted_hits.bam
6401438 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 duplicates
6401438 + 0 mapped (100.00%:-nan%)
6401438 + 0 paired in sequencing
2050216 + 0 read1
4351222 + 0 read2
10784 + 0 properly paired (0.17%:-nan%)
205892 + 0 with itself and mate mapped
6195546 + 0 singletons (96.78%:-nan%)
0 + 0 with mate mapped to a different chr
0 + 0 with mate mapped to a different chr (mapQ>=5)


I'm really stumped with this - my STAMPY results are great (~85% reads mapped, ~70% reads paired properly) and eyeballing the results in IGV confirms that reads stack up nicely across expressed regions, and contain very few mismatches.

Any help would be tremendously appreciated!

Many thanks and all the best,

Nick
nr23 is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 05:36 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO