Seqanswers Leaderboard Ad

**dariober** · 12-01-2010, 02:03 AM

Hi,

I don't know if it matters, but the lines with exon feature in your GTF file don't have the attribute 'exon_number' in the attributes column (rightmost). I'm not sure if Tophat needs the 'exon_number' to determine where the splice junctions are. The GTF I use looks like this:

Code:

5	protein_coding	exon	60680	60854	.	-	.	 gene_id "ENSSSCG00000000001"; transcript_id "ENSSSCT00000000001"; exon_number "1";
5	protein_coding	CDS	60680	60854	.	-	0	 gene_id "ENSSSCG00000000001"; transcript_id "ENSSSCT00000000001"; exon_number "1"; protein_id "ENSSSCP00000000001";
5	protein_coding	exon	59106	59218	.	-	.	 gene_id "ENSSSCG00000000001"; transcript_id "ENSSSCT00000000001"; exon_number "2";
5	protein_coding	CDS	59106	59218	.	-	2	 gene_id "ENSSSCG00000000001"; transcript_id "ENSSSCT00000000001"; exon_number "2"; protein_id "ENSSSCP00000000001";

Where did you get your GTF from?

All the best
Dario

**silin284** · 12-03-2010, 11:40 AM

thanks dariober,

it seems the exon number is not a problem.

my GTF has genes in chromosome0 (unassembled stuffs) and the reference genome (bowtie index) does not. Removing the genes in chromosome0 in GTF or adding chro0 to the reference genome solved the problem.

**marcora** · 12-06-2010, 01:14 PM

Originally posted by dariober View Post

Hi,

I don't know if it matters, but the lines with exon feature in your GTF file don't have the attribute 'exon_number' in the attributes column (rightmost). I'm not sure if Tophat needs the 'exon_number' to determine where the splice junctions are. The GTF I use looks like this:

Code:

5	protein_coding	exon	60680	60854	.	-	.	 gene_id "ENSSSCG00000000001"; transcript_id "ENSSSCT00000000001"; exon_number "1";
5	protein_coding	CDS	60680	60854	.	-	0	 gene_id "ENSSSCG00000000001"; transcript_id "ENSSSCT00000000001"; exon_number "1"; protein_id "ENSSSCP00000000001";
5	protein_coding	exon	59106	59218	.	-	.	 gene_id "ENSSSCG00000000001"; transcript_id "ENSSSCT00000000001"; exon_number "2";
5	protein_coding	CDS	59106	59218	.	-	2	 gene_id "ENSSSCG00000000001"; transcript_id "ENSSSCT00000000001"; exon_number "2"; protein_id "ENSSSCP00000000001";

Where did you get your GTF from?

All the best
Dario

Hi Dario,

it looks like you are using the ENSEMBL gtf file from here, is that correct?

I am trying to make it work with mm9 or m_musculus_ncbi37 bowtie indexes from the bowtie website without any luck (I am still getting the "TopHat did not find any junctions in GTF file" warning).

What bowtie index are you using? If you made your own, could you share how?

Thank you very much!

**epigen** · 12-07-2010, 05:23 AM

chromosome name issue?

Originally posted by marcora View Post

Hi Dario,

it looks like you are using the ENSEMBL gtf file from here, is that correct?

I am trying to make it work with mm9 or m_musculus_ncbi37 bowtie indexes from the bowtie website without any luck (I am still getting the "TopHat did not find any junctions in GTF file" warning).

What bowtie index are you using? If you made your own, could you share how?

Thank you very much!

The ENSEMBL gtf is missing the "chr" in front of the chromosome number that is present in the bowtie indexes and the reference genome (fasta format). Try adding "chr" and see if it works then.

**AdamB** · 12-07-2010, 06:34 AM

Originally posted by epigen View Post

The ENSEMBL gtf is missing the "chr" in front of the chromosome number that is present in the bowtie indexes and the reference genome (fasta format). Try adding "chr" and see if it works then.

This worked for me when I was trying to use a gtf from Ensembl.

**marcora** · 12-07-2010, 07:34 AM

Originally posted by epigen View Post

The ENSEMBL gtf is missing the "chr" in front of the chromosome number that is present in the bowtie indexes and the reference genome (fasta format). Try adding "chr" and see if it works then.

Does that mean that you are using the mm9 prepackaged bowtie index which contains chr1,chr2,etc?

Thank you for your suggestion.

**epigen** · 12-07-2010, 10:32 AM

Originally posted by marcora View Post

Does that mean that you are using the mm9 prepackaged bowtie index which contains chr1,chr2,etc?

I don't use it, I built my own, but the Bowtie homepage says "M. musculus, UCSC mm9", which is the same genome I'm using, with chr1,chr2,etc. NCBI has the same format as far as I know, only Ensembl makes an exception.

**marcora** · 12-07-2010, 03:20 PM

Originally posted by epigen View Post

I don't use it, I built my own, but the Bowtie homepage says "M. musculus, UCSC mm9", which is the same genome I'm using, with chr1,chr2,etc. NCBI has the same format as far as I know, only Ensembl makes an exception.

Adding chr in front of each line of the ENSEMBL GTF file doesn't fix the problem.

Any other idea?

**Bacilo** · 01-10-2011, 04:05 AM

I have the same problem. I made my own index using the GRCh37 genome downloaded from ensembl. The chromosome names, when a check with bowtie-inspect -n, are 1,2,3...X,Y, and the names in the ensembl GTF file are the same, but I get the same error message (Warning: TopHat did not find any junctions in GTF file) .I have used ucsc index and gtf file too and it works. This is the ensembl GTF file:

11 pseudogene exon 75780 76143 . + . gene_id "ENSG00000253826"; transcript_id "ENST00000519787"; exon
_number "1"; gene_name "RP11-304M2.1"; transcript_name "RP11-304M2.1-001";
11 processed_transcript exon 86612 87605 . - . gene_id "ENSG00000224777"; transcript_id "ENST0000052119
6"; exon_number "1"; gene_name "AC069287.4"; transcript_name "AC069287.4-002";
11 processed_transcript exon 86649 87586 . - . gene_id "ENSG00000224777"; transcript_id "ENST0000042404
7"; exon_number "1"; gene_name "AC069287.4"; transcript_name "AC069287.4-001";
11 protein_coding exon 129060 129388 . - . gene_id "ENSG00000230724"; transcript_id "ENST00000382784"; exon
_number "1"; gene_name "AC069287.3"; transcript_name "AC069287.3-201";
11 protein_coding CDS 129060 129388 . - 0 gene_id "ENSG00000230724"; transcript_id "ENST00000382784"; exon
_number "1"; gene_name "AC069287.3"; transcript_name "AC069287.3-201"; protein_id "ENSP00000372234";
11 protein_coding start_codon 129386 129388 . - 0 gene_id "ENSG00000230724"; transcript_id "ENST0000038278
4"; exon_number "1"; gene_name "AC069287.3"; transcript_name "AC069287.3-201";
11 protein_coding exon 127926 128376 . - . gene_id "ENSG00000230724"; transcript_id "ENST00000382784"; exon
_number "2"; gene_name "AC069287.3"; transcript_name "AC069287.3-201";
11 protein_coding CDS 127929 128376 . - 1 gene_id "ENSG00000230724"; transcript_id "ENST0

and this is the UCSC:

chr1 hg19_ensGene exon 66999066 66999090 0.000000 + . gene_id "ENST00000237247"; transcript_id
"ENST00000237247";
chr1 hg19_ensGene start_codon 67000042 67000044 0.000000 + . gene_id "ENST00000237247"; transc
ript_id "ENST00000237247";
chr1 hg19_ensGene CDS 67000042 67000051 0.000000 + 0 gene_id "ENST00000237247"; transcript_id
"ENST00000237247";
chr1 hg19_ensGene exon 66999929 67000051 0.000000 + . gene_id "ENST00000237247"; transcript_id
"ENST00000237247";
chr1 hg19_ensGene CDS 67091530 67091593 0.000000 + 2 gene_id "ENST00000237247"; transcript_id
"ENST00000237247";
chr1 hg19_ensGene exon 67091530 67091593 0.000000 + . gene_id "ENST00000237247"; transcript_id
"ENST00000237247";
chr1 hg19_ensGene CDS 67098753 67098777 0.000000 + 1 gene_id "ENST00000237247"; transcript_id
"ENST00000237247";
chr1 hg19_ensGene exon 67098753 67098777 0.000000 + . gene_id "ENST00000237247"; transcript_id
"ENST00000237247";

Despite the chromosome names and the attributes in the rightmost column, all field are the same, excepting the 6th column that is a dot in ensembl GTF and "0.00000" in the UCSC one, but I do not know if this field is important or not.

Does anyone use Ensembl GTF file with success?

Thanks

**AdamB** · 01-10-2011, 04:18 AM

@Bacilo:

I'm not sure if I understrand, but did you try changing the chromosome field in the Ensembl gtf to "chrX"?

**Bacilo** · 01-10-2011, 04:23 AM

The index and the GFT file have the same chromosome names, both without "chr" but I am going to try to change both.

thanks

**AdamB** · 01-10-2011, 04:29 AM

For me, it definitely fixed the problem by adding "chr" to the chromosome field.

**Bacilo** · 01-10-2011, 04:30 AM

I will tell you if that works. thanks

**marcora** · 01-10-2011, 05:02 AM

Originally posted by Bacilo View Post

Does anyone use Ensembl GTF file with success?

After much struggling and with the help of a member of this forum I have finally been able to use Ensembl GTF files with TopHat.

Please find a detailed answer to your problem here!

Good luck!

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 59 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 57 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 52 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 56 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

tophat -G gene model annotations GTF format?

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News