![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Some questions about running tophat & cufflinks | songyj | Bioinformatics | 8 | 09-05-2012 07:09 AM |
pre-filtering before running alignment: help needed | angerusso | Bioinformatics | 2 | 11-15-2011 07:49 PM |
some questions about running tophat & cufflinks | songyj | RNA Sequencing | 0 | 10-18-2011 06:07 PM |
cufflinks running problem | camelbbs | Bioinformatics | 6 | 07-14-2011 02:11 AM |
Segmentation fault when running Cufflinks | doggysaywhat | RNA Sequencing | 0 | 03-30-2011 11:48 AM |
![]() |
|
Thread Tools |
![]() |
#1 |
Junior Member
Location: New York Join Date: Jun 2010
Posts: 5
|
![]()
Hi All,
I am new to this forum and NG. I have installed tophat/bowtie/cufflinks in my server and it runs ok. I encount the following problem when running cufflinks. $ cufflinks -G mm_9.gtf accepted_hits.bam You are using Cufflinks v1.0.3, which is the most recent release. [13:25:25] Loading reference annotation. [13:25:31] Inspecting reads and determining fragment length distribution. > Processing Locus chr16: 57266251-57292978 [***** ] 20% It stops here forever. 2 out of 3 accepted_hits.bam files from tophat stop exactly here at 57266251-57292978. The other one went through and resulted in the results as the manual of Cufflinks described. Checked the bam files, no difference was observed. Anybody had similar issues? how am I supposed to fix it????? thanks zach I looked at the size of these 3 bam files |
![]() |
![]() |
![]() |
#2 |
Senior Member
Location: East Coast, US Join Date: Jun 2010
Posts: 177
|
![]()
Here is part of cufflinks FAQ that may solve your problem:
I'm trying to assemble a sample. Cufflinks is almost done, but it seems to be hanging at "99% complete". What's going on? Cufflinks spawns threads for each locus to assemble and quantitate the "bundle" of reads in that locus. Some loci may have more reads and more complicated alternative splicing than others, which requires more CPU cycles. These bundles can continue processing long after all others have completed, leading to this behavior. You may be able to decrease the number of such bundles by masking out ribosomal and mitochondrial RNA using the -M/--mask-file option described in the Manual. |
![]() |
![]() |
![]() |
#3 |
Junior Member
Location: New York Join Date: Jun 2010
Posts: 5
|
![]()
Dzhang,
thanks. I used the tRNA gene as mask file - does not work. Then I used repeatmasker file downloaded from UCSC. It worked. My question is if using repeatmasker will affect the final results since repeatmaskers can exist anywhere and for any gene (to my understanding). How am supposed to get ribosomal and mitochondrial RNA gtf file? zach |
![]() |
![]() |
![]() |
#4 |
Senior Member
Location: East Coast, US Join Date: Jun 2010
Posts: 177
|
![]()
Hi Zach,
I am glad the repeatmasker solved the problem for you. Usually it should not impact the final results as repetitive sequences are in general thought to contain less information thus have less impact. For ribo and mt RNA gtp files, what I usually do is manually create the gft file from the master gft file - it is not that difficult - just search the gene names and copy them out to a separate file. Hope this helps. Douglas www.contigexpress.com |
![]() |
![]() |
![]() |
#5 |
Junior Member
Location: New York Join Date: Jun 2010
Posts: 5
|
![]()
Douglas,
The gtf file for mouse from UCSC is in the following format. No directly label of which gene is rRNA or mt RNA: chr1 mm9_ensGene start_codon 134212807 134212809 0.000000 + . gene_id "ENSMUST00000072177"; transcript_id "ENSMUST00000072177"; chr1 mm9_ensGene CDS 134212807 134213049 0.000000 + 0 gene_id "ENSMUST00000072177"; transcript_id "ENSMUST00000072177"; chr1 mm9_ensGene exon 134212703 134213049 0.000000 + . gene_id "ENSMUST00000072177"; transcript_id "ENSMUST00000072177"; chr1 mm9_ensGene CDS 134221530 134221650 0.000000 + 0 gene_id "ENSMUST00000072177"; transcript_id "ENSMUST00000072177"; chr1 mm9_ensGene exon 134221530 134221650 0.000000 + . gene_id "ENSMUST00000072177"; transcript_id "ENSMUST00000072177"; chr1 mm9_ensGene CDS 134222783 134222806 0.000000 + 2 gene_id "ENSMUST00000072177"; transcript_id "ENSMUST00000072177"; chr1 mm9_ensGene exon 134222783 134222806 0.000000 + . gene_id "ENSMUST00000072177"; transcript_id "ENSMUST00000072177"; chr1 mm9_ensGene CDS 134224274 134224425 0.000000 + 2 gene_id "ENSMUST00000072177"; transcript_id "ENSMUST00000072177"; chr1 mm9_ensGene exon 134224274 134224425 0.000000 + . gene_id "ENSMUST00000072177"; transcript_id "ENSMUST00000072177"; chr1 mm9_ensGene CDS 134224708 134224773 0.000000 + 0 gene_id "ENSMUST00000072177"; transcript_id "ENSMUST00000072177"; chr1 mm9_ensGene exon 134224708 134224773 0.000000 + . gene_id "ENSMUST00000072177"; transcript_id "ENSMUST00000072177"; |
![]() |
![]() |
![]() |
#6 |
Senior Member
Location: East Coast, US Join Date: Jun 2010
Posts: 177
|
![]()
Hi zach,
You need to check which gene IDs corresponds to ribo/mt genes. For ribo genes, there are not too many and you can perform manual check. For mt genes, search the first column as they will not reside on any chromosomes. |
![]() |
![]() |
![]() |
#7 |
Senior Member
Location: East Coast, US Join Date: Jun 2010
Posts: 177
|
![]()
go to the UCSC genome website, in the field of "position or search term", enter "ribosomal RNA" and you will get a list of genes with chr. positions. Based on that you can get the gene IDs in your gtf file. I hope there is an easier but I do not work with the Mouse genome often...
|
![]() |
![]() |
![]() |
Thread Tools | |
|
|