SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Some questions about running tophat & cufflinks songyj Bioinformatics 8 09-05-2012 07:09 AM
pre-filtering before running alignment: help needed angerusso Bioinformatics 2 11-15-2011 07:49 PM
some questions about running tophat & cufflinks songyj RNA Sequencing 0 10-18-2011 06:07 PM
cufflinks running problem camelbbs Bioinformatics 6 07-14-2011 02:11 AM
Segmentation fault when running Cufflinks doggysaywhat RNA Sequencing 0 03-30-2011 11:48 AM

Reply
 
Thread Tools
Old 07-27-2011, 11:59 AM   #1
zach
Junior Member
 
Location: New York

Join Date: Jun 2010
Posts: 5
Default Help needed for running cufflinks

Hi All,

I am new to this forum and NG. I have installed tophat/bowtie/cufflinks in my server and it runs ok.

I encount the following problem when running cufflinks.

$ cufflinks -G mm_9.gtf accepted_hits.bam
You are using Cufflinks v1.0.3, which is the most recent release.
[13:25:25] Loading reference annotation.
[13:25:31] Inspecting reads and determining fragment length distribution.
> Processing Locus chr16: 57266251-57292978 [***** ] 20%

It stops here forever. 2 out of 3 accepted_hits.bam files from tophat stop exactly here at 57266251-57292978. The other one went through and resulted in the results as the manual of Cufflinks described.

Checked the bam files, no difference was observed.

Anybody had similar issues? how am I supposed to fix it????? thanks

zach

I looked at the size of these 3 bam files
zach is offline   Reply With Quote
Old 07-28-2011, 11:56 AM   #2
DZhang
Senior Member
 
Location: East Coast, US

Join Date: Jun 2010
Posts: 177
Default

Here is part of cufflinks FAQ that may solve your problem:

I'm trying to assemble a sample. Cufflinks is almost done, but it seems to be hanging at "99% complete". What's going on?

Cufflinks spawns threads for each locus to assemble and quantitate the "bundle" of reads in that locus. Some loci may have more reads and more complicated alternative splicing than others, which requires more CPU cycles. These bundles can continue processing long after all others have completed, leading to this behavior. You may be able to decrease the number of such bundles by masking out ribosomal and mitochondrial RNA using the -M/--mask-file option described in the Manual.
DZhang is offline   Reply With Quote
Old 08-02-2011, 08:23 AM   #3
zach
Junior Member
 
Location: New York

Join Date: Jun 2010
Posts: 5
Default

Dzhang,

thanks. I used the tRNA gene as mask file - does not work. Then I used repeatmasker file downloaded from UCSC. It worked. My question is if using repeatmasker will affect the final results since repeatmaskers can exist anywhere and for any gene (to my understanding).

How am supposed to get ribosomal and mitochondrial RNA gtf file?

zach
zach is offline   Reply With Quote
Old 08-02-2011, 08:30 AM   #4
DZhang
Senior Member
 
Location: East Coast, US

Join Date: Jun 2010
Posts: 177
Default

Hi Zach,

I am glad the repeatmasker solved the problem for you. Usually it should not impact the final results as repetitive sequences are in general thought to contain less information thus have less impact.

For ribo and mt RNA gtp files, what I usually do is manually create the gft file from the master gft file - it is not that difficult - just search the gene names and copy them out to a separate file.

Hope this helps.

Douglas
www.contigexpress.com
DZhang is offline   Reply With Quote
Old 08-02-2011, 10:25 AM   #5
zach
Junior Member
 
Location: New York

Join Date: Jun 2010
Posts: 5
Default

Douglas,

The gtf file for mouse from UCSC is in the following format. No directly label of which gene is rRNA or mt RNA:

chr1 mm9_ensGene start_codon 134212807 134212809 0.000000 + . gene_id "ENSMUST00000072177"; transcript_id "ENSMUST00000072177";
chr1 mm9_ensGene CDS 134212807 134213049 0.000000 + 0 gene_id "ENSMUST00000072177"; transcript_id "ENSMUST00000072177";
chr1 mm9_ensGene exon 134212703 134213049 0.000000 + . gene_id "ENSMUST00000072177"; transcript_id "ENSMUST00000072177";
chr1 mm9_ensGene CDS 134221530 134221650 0.000000 + 0 gene_id "ENSMUST00000072177"; transcript_id "ENSMUST00000072177";
chr1 mm9_ensGene exon 134221530 134221650 0.000000 + . gene_id "ENSMUST00000072177"; transcript_id "ENSMUST00000072177";
chr1 mm9_ensGene CDS 134222783 134222806 0.000000 + 2 gene_id "ENSMUST00000072177"; transcript_id "ENSMUST00000072177";
chr1 mm9_ensGene exon 134222783 134222806 0.000000 + . gene_id "ENSMUST00000072177"; transcript_id "ENSMUST00000072177";
chr1 mm9_ensGene CDS 134224274 134224425 0.000000 + 2 gene_id "ENSMUST00000072177"; transcript_id "ENSMUST00000072177";
chr1 mm9_ensGene exon 134224274 134224425 0.000000 + . gene_id "ENSMUST00000072177"; transcript_id "ENSMUST00000072177";
chr1 mm9_ensGene CDS 134224708 134224773 0.000000 + 0 gene_id "ENSMUST00000072177"; transcript_id "ENSMUST00000072177";
chr1 mm9_ensGene exon 134224708 134224773 0.000000 + . gene_id "ENSMUST00000072177"; transcript_id "ENSMUST00000072177";
zach is offline   Reply With Quote
Old 08-02-2011, 10:38 AM   #6
DZhang
Senior Member
 
Location: East Coast, US

Join Date: Jun 2010
Posts: 177
Default

Hi zach,

You need to check which gene IDs corresponds to ribo/mt genes. For ribo genes, there are not too many and you can perform manual check. For mt genes, search the first column as they will not reside on any chromosomes.
DZhang is offline   Reply With Quote
Old 08-02-2011, 10:44 AM   #7
DZhang
Senior Member
 
Location: East Coast, US

Join Date: Jun 2010
Posts: 177
Default

go to the UCSC genome website, in the field of "position or search term", enter "ribosomal RNA" and you will get a list of genes with chr. positions. Based on that you can get the gene IDs in your gtf file. I hope there is an easier but I do not work with the Mouse genome often...
DZhang is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 08:06 AM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2021, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO