Seqanswers Leaderboard Ad

**gen2prot** · 05-19-2010, 01:06 PM

Hello,

Tophat seems to have run properly, the error output has no warnings. However the accepted.sam file shows no hits. The junctions.bed file is 1.3 MB. Anyone faced a similar problem?

Thanks
Abhijit

**gen2prot** · 05-19-2010, 01:54 PM

Hello,

Following up on this, I wanted to know if indexing of all genes is possible or not in the first place. My genes.fasta file looks something like this.

Code:

>FBgn0034974 type=gene; loc=2R:19969255..19973683; ID=FBgn0034974; name=CG16786; dbxref=FlyBase:FBgn0034974,FlyBase:FBan0016786,FlyBase_Annotation_IDs:CG16786,GB:BI363616,GB:BT001664,GB_protein:AAN71419,GB_protein:AAF47161,GB_protein:AAM68305,UniProt/TrEMBL:Q7JRF0,INTERPRO:IPR011071,EntrezGene:37856,BIOGRID:63435,DroID:FBgn0034974,DRSC:FBgn0034974,FlyAtlas:CG16786-RA,flyexpress:FBgn0034974,FlyMine:FBgn0034974,GenomeRNAi_gene:37856,modMine:FBgn0034974; derived_computed_cyto=60B8-60B9%3B Limits computationally determined from genome sequence between @P{lacW}Phm<up>k07623</up>@%26@P{lacW}tsr<up>k05633</up>@ and @P{EP}EP503@; gbunit=AE013599; MD5=4a28df05c5f7a49b8fd75a28e3b5759e; length=4429; release=r5.27; species=Dmel; 
CGGATTCGGATTCAGATTCACATTCAGATTCAGATACGTTCGGTTTGGGA
TTCGGATTCATTCGTTGCCACTCCAGCTCTATGCTCCGCGTTGGACCCAC
CGATAGCTTGGCTTTCTGCTACAGTTTCATAATTGTCTCGGCCAGCAGCA
GCGGAGTTCATGATTTCGCTCGGAATATGTTTTAGCCAGATCAGTGCTTG
GAAAATGCACTTTTGAGCGTGTACGTGTATGTGGCAAGTAGCTGGCGAAC
GTGAATGAAAACATGAGCTGCCACTGAACGAAACCCACTCTCGAGCTGGA
AGTGCAAGTGAGTTATCCCGCGGAAGAAAAGAAACTGAATTGATTACCAT
TACCATTCGCGGAGTAGCAGTCTCGGAATTAAATACCAACGACCCAGACA
ATACCGAGCCCAGTTCCAAGCTGGAGGCTCAAGCCTTTCTCTATTCAATG

Do I need to re-import a modified fasta file which has a shorter head information? There seems to be a lot of characters in the header which I cannot understand.

thanks
Abhijit

**Thomas Doktor** · 05-20-2010, 02:26 AM

Hi,

You should build a new bowtie index of the Drosophila genome and not of the individual genes as TopHat is designed to align RNA-seq reads against a full genome. This might explain the behaviour of TopHat, although it should have aligned some reads after all. Perhaps the characters in the fasta headers are causing trouble or there are too many contigs for TopHat to handle well.

**gen2prot** · 05-20-2010, 07:05 AM

Hi Thomas,

I ran Tophat on the chromosomes and it works wonderfully. I think the fasta header might be the one to blame, since there are characters such as %@><{} etc. It appeared to me as some sort of construct info. Anyways I removed everything except the name of the sequence, and am building the gene index again. Lets see. However, the gene file that I am building the index from is 85 MB in size containing 14964 genes. You think this may cause a problem? Thanks for your help.

Abhijit

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 59 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 57 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 51 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 56 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Cannot understand Tophat output... Help!

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News