Seqanswers Leaderboard Ad

**maximilianh** · 03-03-2011, 05:19 AM

Avoid changing the GTF file

I would caution against messing around with the GTF file, as in the future this should break, as soon as Ensembl switches to the next version. In addition, it doesn't work for any other organisms.

I am running cufflinks and solved the problem differently: I indexed the original Ensembl genome, ran bowtie on it, converted the samfile to bam, sorted it, removed this weird transcript from the Ensembl file and ran cufflinks normally, without any awk script.

here is a log file of what I did:

Code:

bowtie-build -C Mus_musculus.NCBIM37.61.dna.toplevel.fa Mus_musculus.NCBIM37.61.dna.toplevel_c
bowtie -f -C -m 1 -p4 --sam $(BOWTIEINDEX) $$i > $(MAPPEDREADS)/`basename $$i .txt`.sam
samtools view -Sb sam/SL005_R00002_RME033_01pg_F3.csfasta.sam > sam/SL005_R00002_RME033_01pg_F3.csfasta.bam
samtools sort -Sb SL005_R00002_RME033_01pg_F3.csfasta.bam SL005_R00002_RME033_01pg_F3.csfasta.sorted
512391
grep -v ENSMUST00000127664 Mus_musculus.NCBIM37.61.gtf > Mus_musculus.NCBIM37.61.corrected.gtf
~/software/cufflinks-0.9.3/cufflinks -G Mus_musculus.NCBIM37.61.corrected.gtf -v sam.old/SL005_R00002_RME033_01pg_F3.csfasta.sorted.

**Auction** · 04-22-2011, 01:12 PM

Another solution is go to tophat-1.2.0/src/gff.cpp
and change
const uint GFF_MAX_LOCUS = 4000000;
to
const uint GFF_MAX_LOCUS = 5000000;
then recompile the tophat

Ensemble database indicate that this transcript ENSMUST00000127664 is 4.43Mb, bigger than previous cut-off GFF_MAX_LOCUS = 4000000.

Originally posted by marcora View Post

My bad! This last time when I ran tophat I didn't rename the cleaned GTF file to "mm9.ensembl" and therefore tophat couldn't find it. Surprisingly, instead of reporting a missing file, tophat gave me the same exact warning as before.

In conclusion, with the squeaky clean GTF file obtained from Mus_musculus.NCBIM37.60.gtf as such:

Code:

awk '{print "chr"$0}' Mus_musculus.NCBIM37.60.gtf | sed 's/chrMT/chrM/g' | awk '/^chr[1-9XYM]|^chr1[0-9]/' | grep -v "ENSMUST00000127664" > mm9.ensembl.gtf

I am finally able to run tophat against the ENSEMBL annotation.

You are my hero!

Thank you very much for your help.

**honey** · 04-22-2011, 02:11 PM

Gtf

Will try new approach in next run uptill now we have been changing GTF file

**arielpaulson** · 05-09-2012, 12:09 PM

Additional error source

Just wanted to add -- as of tophat 1.4.0 -- there is another way to get this "did not find any junctions" error message. If your gtf has any entries with nonstandard strand symbols, for instance '*', parsing will apparently fail for the the entire gtf, even though all other entries are OK.

Reading gtf_juncs.log will show you the offending line.

**Charitra** · 04-14-2013, 07:56 PM

Originally posted by marcora View Post

My bad! This last time when I ran tophat I didn't rename the cleaned GTF file to "mm9.ensembl" and therefore tophat couldn't find it. Surprisingly, instead of reporting a missing file, tophat gave me the same exact warning as before.

In conclusion, with the squeaky clean GTF file obtained from Mus_musculus.NCBIM37.60.gtf as such:

Code:

awk '{print "chr"$0}' Mus_musculus.NCBIM37.60.gtf | sed 's/chrMT/chrM/g' | awk '/^chr[1-9XYM]|^chr1[0-9]/' | grep -v "ENSMUST00000127664" > mm9.ensembl.gtf

I am finally able to run tophat against the ENSEMBL annotation.

You are my hero!

Thank you very much for your help.

Dear All

Please tell me , I got the same error as follows. I am using everything from UCSC hg19 and downloaded everything, unpacked them but still getting this error:
What to do ?

Warning: couldn't find fasta record for 'chr17_ctg5_hap1'!
Warning: couldn't find fasta record for 'chr17_gl000205_random'!
Warning: couldn't find fasta record for 'chr19_gl000209_random'!
Warning: couldn't find fasta record for 'chr1_gl000191_random'!
Warning: couldn't find fasta record for 'chr4_ctg9_hap1'!
Warning: couldn't find fasta record for 'chr4_gl000193_random'!
Warning: couldn't find fasta record for 'chr4_gl000194_random'!
Warning: couldn't find fasta record for 'chr6_apd_hap1'!
Warning: couldn't find fasta record for 'chr6_cox_hap2'!
Warning: couldn't find fasta record for 'chr6_dbb_hap3'!
Warning: couldn't find fasta record for 'chr6_mann_hap4'!
Warning: couldn't find fasta record for 'chr6_mcf_hap5'!
Warning: couldn't find fasta record for 'chr6_qbl_hap6'!
Warning: couldn't find fasta record for 'chr6_ssto_hap7'!
Warning: couldn't find fasta record for 'chr7_gl000195_random'!
Warning: couldn't find fasta record for 'chrUn_gl000211'!
Warning: couldn't find fasta record for 'chrUn_gl000212'!
Warning: couldn't find fasta record for 'chrUn_gl000218'!
Warning: couldn't find fasta record for 'chrUn_gl000219'!
Warning: couldn't find fasta record for 'chrUn_gl000220'!
Warning: couldn't find fasta record for 'chrUn_gl000222'!
Warning: couldn't find fasta record for 'chrUn_gl000223'!
Warning: couldn't find fasta record for 'chrUn_gl000228'!

**marcora** · 04-15-2013, 07:34 AM

The command that you referenced is for genome annotation files from ENSEMBL (release 60), not for UCSC files.

**Charitra** · 04-15-2013, 06:25 PM

Thank you so much for guidance. I was so confused ... and now thats a relief.
I am using UCSC data and commands of ensembl...thats the problem..
Well, this mean I should download the emsembl GRCh37 ?
.. but I am confused about seq output results... do you think differential expression results from ensembl data/commands and UCSC data/commands will have no significant different ?

There must be a best way out of both ?

Kindly do reply if you get reply.

**Charitra** · 04-16-2013, 05:22 PM

Please suggest me something ...
I got a known problem but I have no solution. I tried downloading data (wget) from 'Ensembl GRCh37 17297 MB May 14 17:23' but got error after 12 hrs..

--2013-04-17 02:23:44-- ftp://igenome:*password*@ussd-ftp.il..._GRCh37.tar.gz
(try: 2) => `Homo_sapiens_Ensembl_GRCh37.tar.gz'
==> CWD not required.
==> SIZE Homo_sapiens_Ensembl_GRCh37.tar.gz ... Aborted

I searched, this problem is posted couple of times... But I cant find the solution....???

Can somebody give me a hit...

**eastasiasnow** · 01-15-2014, 12:10 AM

one more thing to share

tophat2 2.0.10 still doesn't recognize gzipped file in -G option, so do uncompress the gtf file before you run the command. Or you will be confronted with warning "opHat did not find any junctions in GTF file" first, then with the error "gtf_to_fasta returned an error".

i downloaded fasta and gtf from ensemble ftp site for zea mays. files are gzipped for save spaces.

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 17 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 22 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 16 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 46 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News