![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Updated How to convert .txt file to .bed .GFF or .BAR file format, | forevermark4 | Bioinformatics | 2 | 06-30-2014 06:02 AM |
tophat gff file error | repinementer | Bioinformatics | 2 | 07-20-2010 04:28 AM |
Convert segemehl *.map file into gff file. | satheshsiva | Bioinformatics | 0 | 07-16-2010 05:40 AM |
problem of tophat gff file | syslm01 | Bioinformatics | 0 | 05-14-2010 08:12 AM |
GFF file for TopHat | joseph | RNA Sequencing | 2 | 06-15-2009 01:46 AM |
![]() |
|
Thread Tools |
![]() |
#1 |
Junior Member
Location: Davis Join Date: Oct 2010
Posts: 3
|
![]()
Hello,
Does anyone know where I can download a GTF file that will work using Tophat and their provided mm9 build? I downloaded the version from ftp://ftp.ensembl.org/pub/current/gtf/mus_musculus/ and keep getting the following error: [Thu Oct 28 12:08:01 2010] Reading known junctions from GFF file Warning: TopHat did not find any junctions in GFF file I have even tried reformatting the file by adding "chr" in front of everything in the first column of each line (this changes the notation of X of 18 to chrX or chr18). At this point I would prefer downloading a GTF build that works with Tophat v1.1.1 but I can also try to modify the file I have now if someone knows what needs to be changed A sample of one line of the GTF file: 18 protein_coding CDS 30483176 30483260 . + 0 gene_id "ENSMUSG00000033628"; transcript_id "ENSMUST00000115811"; exon_number "20"; gene_name "Pik3c3"; transcript_name "Pik3c3-004"; protein_id "ENSMUSP00000111478"; -Keith |
![]() |
![]() |
![]() |
#2 | |
Senior Member
Location: Rochester, MN Join Date: Mar 2009
Posts: 191
|
![]() Quote:
http://genome.ucsc.edu/cgi-bin/hgTab...a_doMainPage=1 2) Select mouse genome assmbly mm9 3) Select Genes and Gene Prediction Tracks in the Group section 4) Select the Ensemble Genes track 5) Under output format select GTF 6) Give the output file a name 7) Get output |
|
![]() |
![]() |
![]() |
#3 | |
Junior Member
Location: Davis Join Date: Oct 2010
Posts: 3
|
![]() Quote:
chr1 mm9_ensGene start_codon 134212807 134212809 0.000000 + . gene_id "ENSMUST00000072177"; transcript_id "ENSMUST00000072177"; Other information that might be helpful, versions of programs I am using: Tophat: 1.1.1 Bowtie: 0.12.7 cufflinks: 0.9.1 myrna: 1.0.9 samtools: 0.1.8 |
|
![]() |
![]() |
![]() |
#4 | |
Senior Member
Location: Rochester, MN Join Date: Mar 2009
Posts: 191
|
![]() Quote:
I just followed those instructions and it worked fine. |
|
![]() |
![]() |
![]() |
#5 | |
Junior Member
Location: Davis Join Date: Oct 2010
Posts: 3
|
![]() Quote:
tophat -p 4 -o DMSO_tophat_test -G /home/lab/Downloads/ENSEMBLE.genes.gtf --no-novel-juncs /home/lab/Tools/bowtie-0.12.7/indexes/mm9 /home/lab/Data/DMSO_Run/s_6_sequence.fq and the first 10 lines of the GTF file are: chr1 mm9_ensGene start_codon 134212807 134212809 0.000000 + . gene_id "ENSMUST00000072177"; transcript_id "ENSMUST00000072177"; chr1 mm9_ensGene CDS 134212807 134213049 0.000000 + 0 gene_id "ENSMUST00000072177"; transcript_id "ENSMUST00000072177"; chr1 mm9_ensGene exon 134212703 134213049 0.000000 + . gene_id "ENSMUST00000072177"; transcript_id "ENSMUST00000072177"; chr1 mm9_ensGene CDS 134221530 134221650 0.000000 + 0 gene_id "ENSMUST00000072177"; transcript_id "ENSMUST00000072177"; chr1 mm9_ensGene exon 134221530 134221650 0.000000 + . gene_id "ENSMUST00000072177"; transcript_id "ENSMUST00000072177"; chr1 mm9_ensGene CDS 134222783 134222806 0.000000 + 2 gene_id "ENSMUST00000072177"; transcript_id "ENSMUST00000072177"; chr1 mm9_ensGene exon 134222783 134222806 0.000000 + . gene_id "ENSMUST00000072177"; transcript_id "ENSMUST00000072177"; chr1 mm9_ensGene CDS 134224274 134224425 0.000000 + 2 gene_id "ENSMUST00000072177"; transcript_id "ENSMUST00000072177"; chr1 mm9_ensGene exon 134224274 134224425 0.000000 + . gene_id "ENSMUST00000072177"; transcript_id "ENSMUST00000072177"; chr1 mm9_ensGene CDS 134224708 134224773 0.000000 + 0 gene_id "ENSMUST00000072177"; transcript_id "ENSMUST00000072177"; |
|
![]() |
![]() |
![]() |
#6 |
Senior Member
Location: Rochester, MN Join Date: Mar 2009
Posts: 191
|
![]()
Keith,
I was able to reproduce your error using the GTF lines that you supplied. However, using my human data, the process I described above works just fine. Something else you can try is to use this GTF instead. Try it and post back your results. |
![]() |
![]() |
![]() |
#7 | |
Senior Member
Location: Rochester, MN Join Date: Mar 2009
Posts: 191
|
![]() Quote:
Code:
awk '{print "chr"$0}' Homo_sapiens.GRCh37.59.gtf > ENSEMBLE.gtf |
|
![]() |
![]() |
![]() |
#8 | |
Member
Location: Dublin Join Date: Mar 2010
Posts: 19
|
![]() Quote:
|
|
![]() |
![]() |
![]() |
#9 |
Member
Location: Pasadena, CA Join Date: May 2009
Posts: 45
|
![]()
You are probably better off just inputing it a junctions file of the simple chr / left / right / strand variety. Those always work, and it is relatively trivial to generate them from any annotation format. I have had gtf files rejected too so I have switched to that format completely for all genomes I work with when mapping with TopHat
|
![]() |
![]() |
![]() |
#10 | |
Member
Location: Dublin Join Date: Mar 2010
Posts: 19
|
![]() Quote:
|
|
![]() |
![]() |
![]() |
#11 |
Junior Member
Location: Paris Join Date: Nov 2010
Posts: 4
|
![]()
Does anyone have a quick list of the most used SAMtool command lines
I'm really new to using UNIX:Linux and I would greatly appreciate if someone could share a pdf/doc for the SAMtool commands Thank you very much and hv a nice day Pawan |
![]() |
![]() |
![]() |
#12 |
Junior Member
Location: Tucson, AZ Join Date: Oct 2010
Posts: 4
|
![]()
I had the same error message as the original post. In the logs subdirectory I found a file called "gtf_juncs.log" with the contents:
Code:
gtf_juncs v1.1.4 (1709) --------------------------- Error: duplicate GFF ID 'ENSMUST00000127664' (or exons too far apart)! |
![]() |
![]() |
![]() |
#13 |
Senior Member
Location: China Join Date: Sep 2009
Posts: 199
|
![]()
hi KeithD,
Do you figure out the error message about "Warning: TopHat did not find any junctions in GTF file"? I'm facing the same error message as well ![]() Thanks for advice and sharing ![]() |
![]() |
![]() |
![]() |
#14 |
Senior Member
Location: China Join Date: Sep 2009
Posts: 199
|
![]()
Hi nkwuji,
mind to share the script that you written to transform the UCSC data table to GTF file? I'm facing the same error message in Cufflink as well ![]() Thanks in advance. |
![]() |
![]() |
![]() |
#15 |
Senior Member
Location: China Join Date: Sep 2009
Posts: 199
|
![]()
Hi nkwuji,
Is it we need to prepare the junction file based on the annotate gtf file from Ensembl or UCSC? Thanks. |
![]() |
![]() |
![]() |
#16 |
Junior Member
Location: UK Join Date: Jul 2014
Posts: 1
|
![]()
In Linux the one liner below will create a new file with only rows containing CDS and exon in the third column.
awk '$3=="CDS" || $3=="exon"' myFile.gff3 > new_myFile.gff3 |
![]() |
![]() |
![]() |
Thread Tools | |
|
|