SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
To understand Punnett Squares ardmore General 2 08-31-2011 01:03 PM
How to understand the output of mpileup like this skblazer Bioinformatics 0 12-05-2010 10:43 AM
BFAST index, not understand warnings zhanxw Bioinformatics 1 06-11-2010 09:14 AM
Help me understand MAQ indexing pieffe Bioinformatics 0 06-01-2009 07:09 AM

Reply
 
Thread Tools
Old 05-19-2010, 07:07 AM   #1
gen2prot
Member
 
Location: Hyderabad, India

Join Date: Apr 2010
Posts: 64
Default Cannot understand Tophat output... Help!

Hello All,

I built an index using Drosophila genes giving the bowtie-build command. This worked just fine and I got the six files. Then I ran Tophat from the same folder using the following command

tophat --solexa1.3-quals -p 4 GeneIndex path_to_reads

My reads are unpaired. The output says that the Tophat run was OK. There is nothing in the error file. However in the tophat_out folder, I get the following files and directories.

File: GeneIndex.fa
File: left_kept_reads.fq
Folders: log, tmp

Within these folders I do not find the files accepted.sam and junctions.bed. What am I doing wrong. Any suggestions?

Thank you
Abhijit
gen2prot is offline   Reply With Quote
Old 05-19-2010, 01:06 PM   #2
gen2prot
Member
 
Location: Hyderabad, India

Join Date: Apr 2010
Posts: 64
Default

Hello,

Tophat seems to have run properly, the error output has no warnings. However the accepted.sam file shows no hits. The junctions.bed file is 1.3 MB. Anyone faced a similar problem?

Thanks
Abhijit
gen2prot is offline   Reply With Quote
Old 05-19-2010, 01:54 PM   #3
gen2prot
Member
 
Location: Hyderabad, India

Join Date: Apr 2010
Posts: 64
Default

Hello,

Following up on this, I wanted to know if indexing of all genes is possible or not in the first place. My genes.fasta file looks something like this.

Code:
>FBgn0034974 type=gene; loc=2R:19969255..19973683; ID=FBgn0034974; name=CG16786; dbxref=FlyBase:FBgn0034974,FlyBase:FBan0016786,FlyBase_Annotation_IDs:CG16786,GB:BI363616,GB:BT001664,GB_protein:AAN71419,GB_protein:AAF47161,GB_protein:AAM68305,UniProt/TrEMBL:Q7JRF0,INTERPRO:IPR011071,EntrezGene:37856,BIOGRID:63435,DroID:FBgn0034974,DRSC:FBgn0034974,FlyAtlas:CG16786-RA,flyexpress:FBgn0034974,FlyMine:FBgn0034974,GenomeRNAi_gene:37856,modMine:FBgn0034974; derived_computed_cyto=60B8-60B9%3B Limits computationally determined from genome sequence between @P{lacW}Phm<up>k07623</up>@%26@P{lacW}tsr<up>k05633</up>@ and @P{EP}EP503@; gbunit=AE013599; MD5=4a28df05c5f7a49b8fd75a28e3b5759e; length=4429; release=r5.27; species=Dmel; 
CGGATTCGGATTCAGATTCACATTCAGATTCAGATACGTTCGGTTTGGGA
TTCGGATTCATTCGTTGCCACTCCAGCTCTATGCTCCGCGTTGGACCCAC
CGATAGCTTGGCTTTCTGCTACAGTTTCATAATTGTCTCGGCCAGCAGCA
GCGGAGTTCATGATTTCGCTCGGAATATGTTTTAGCCAGATCAGTGCTTG
GAAAATGCACTTTTGAGCGTGTACGTGTATGTGGCAAGTAGCTGGCGAAC
GTGAATGAAAACATGAGCTGCCACTGAACGAAACCCACTCTCGAGCTGGA
AGTGCAAGTGAGTTATCCCGCGGAAGAAAAGAAACTGAATTGATTACCAT
TACCATTCGCGGAGTAGCAGTCTCGGAATTAAATACCAACGACCCAGACA
ATACCGAGCCCAGTTCCAAGCTGGAGGCTCAAGCCTTTCTCTATTCAATG
Do I need to re-import a modified fasta file which has a shorter head information? There seems to be a lot of characters in the header which I cannot understand.

thanks
Abhijit
gen2prot is offline   Reply With Quote
Old 05-20-2010, 02:26 AM   #4
Thomas Doktor
Senior Member
 
Location: University of Southern Denmark (SDU), Denmark

Join Date: Apr 2009
Posts: 105
Default

Hi,

You should build a new bowtie index of the Drosophila genome and not of the individual genes as TopHat is designed to align RNA-seq reads against a full genome. This might explain the behaviour of TopHat, although it should have aligned some reads after all. Perhaps the characters in the fasta headers are causing trouble or there are too many contigs for TopHat to handle well.
Thomas Doktor is offline   Reply With Quote
Old 05-20-2010, 07:05 AM   #5
gen2prot
Member
 
Location: Hyderabad, India

Join Date: Apr 2010
Posts: 64
Default

Hi Thomas,

I ran Tophat on the chromosomes and it works wonderfully. I think the fasta header might be the one to blame, since there are characters such as %@><{} etc. It appeared to me as some sort of construct info. Anyways I removed everything except the name of the sequence, and am building the gene index again. Lets see. However, the gene file that I am building the index from is 85 MB in size containing 14964 genes. You think this may cause a problem? Thanks for your help.

Abhijit
gen2prot is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 02:49 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2019, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO