SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
tophat2/samtools ajgentles RNA Sequencing 6 10-20-2013 07:02 PM
tophat2 errors ahmetz Bioinformatics 25 09-04-2013 06:24 AM
Tophat2.03: error mrfox Bioinformatics 6 08-07-2013 05:09 AM
tophat2 error Xi Wang Bioinformatics 13 12-21-2012 06:36 AM
TopHat vs Tophat2 sphil Bioinformatics 4 12-04-2012 06:50 PM

Reply
 
Thread Tools
Old 01-13-2013, 09:22 PM   #1
amarth
Member
 
Location: Mexico City

Join Date: Dec 2012
Posts: 14
Question Getting just a few alignments with Tophat2

I made an alignment for a 1.1GB file, the RNA-seq reads were in fastq (Illumina) format. The reference was in fasta, so i decided to convert the large file to fasta...

Later i made the bowtie libraries, and then i started. When tophat finished the work, i looked to the output folder, and i saw that the sequences aligned were a 28MB *.bam file, on the other hand, the sequences *rejected* were almost a 350MB *.bam file... I deduce that the file size is proportional to the sequences amount.

So, i'm a rookie on Bioinformatics, and my question is: is it normal to have such file sizes on both files, or i'm just doing it wrong?
amarth is offline   Reply With Quote
Old 01-13-2013, 09:30 PM   #2
chadn737
Senior Member
 
Location: US

Join Date: Jan 2009
Posts: 392
Default

1) Do not convert the reads to fasta. That is unnecessary, Tophat takes fastq files....your references will always be in fasta, this is not an issue.

2) It sounds like you did have a lot of unmapped reads, but its impossible to diagnose the issue from what you have described. Try providing some information like, the exact commands you ran Tophat with or some examples of unmapped reads.
chadn737 is offline   Reply With Quote
Old 01-13-2013, 10:19 PM   #3
amarth
Member
 
Location: Mexico City

Join Date: Dec 2012
Posts: 14
Default

1) Thanks for the advice,

2) Could it be the seq Quality?









The Tophat2 command i ran was:
Quote:
tophat2 --sequence-length 100 --max-insertion-length 3 --max-deletion-length 3 reference index3.fasta
so much thanks
amarth is offline   Reply With Quote
Old 01-14-2013, 12:00 AM   #4
EGrassi
Member
 
Location: Turin, Italy

Join Date: Oct 2010
Posts: 66
Default

What does samtools flagstat on the accepted_hits.bam and rejected looks like?

Just to mention the fact: I had some paired data with fastqc results similar to yours and quality trimming lead me to nothing, the problem was about the -r/--mate-std-dev parameters (I had paired data): it seems that in some cases tophat really needs them (-r 300 --mate-std-dev 50 gave me a 60% percentage of properly paired reads against the 5% without them...I will never trust the FAQ/manual again ).
EGrassi is offline   Reply With Quote
Old 01-14-2013, 09:01 AM   #5
kmcarr
Senior Member
 
Location: USA, Midwest

Join Date: May 2008
Posts: 1,177
Default

Quote:
Originally Posted by amarth View Post
2) Could it be the seq Quality?
The quality of your reads looks fine.
Quote:
The Tophat2 command i ran was:
Code:
tophat2 --sequence-length 100 --max-insertion-length 3 --max-deletion-length 3 reference index3.fasta
so much thanks
There is no '--sequence-length' option for tophat. Did you mean '--segment-length'? If so 100 (presumably the full length of your read) is not an appropriate setting. The default value for --segment-length (25) is appropriate for most cases.

To diagnose the problem start using only default options and then work out from there.
kmcarr is offline   Reply With Quote
Reply

Tags
alignment problem, bowtie, fastq bam sam, tophat

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 08:00 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO