![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
couldn't run CNVnator | menenuh | Bioinformatics | 39 | 07-29-2014 12:48 PM |
Cannot find any mention of reference gff/gtf annotations in cuffmerge output | anna_vt | Bioinformatics | 0 | 11-13-2012 04:55 AM |
Extract partial sequence from FASTA record | cdlam | Bioinformatics | 9 | 10-30-2012 03:21 PM |
Find all occurrences of a sequence in a fasta file | dphansti | Bioinformatics | 3 | 12-06-2011 07:11 AM |
Where can I find the complete FASTA format sequence(human and mouse)? | iloveneworleans | Bioinformatics | 5 | 02-24-2010 05:00 PM |
![]() |
|
Thread Tools |
![]() |
#1 |
Member
Location: Freiburg Join Date: Oct 2012
Posts: 56
|
![]()
Hi everyone,
I just got my first RNA-seq dataset (50bp, paired-end) and am trying to analyze it using the common top hat - cufflinks - cuffdiff way of doing it. Actually, I am using the pipeline suggested in the following Nat Prot. paper:Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. However, I run into some problems when I use cuffmerge. The annotations files I use, are the one downloaded for mm9 on Tophats homepage provided by Illumina. cuffmerge -g /home/dalgaard/genomes/mm9/Annotation/Genes/genes.gtf -s /home/dalgaard/genomes/mm9/Sequence/WholeGenomeFasta/genome.fa -p 8 assemblies.txt Assemblies.txt contains: /home/dalgaard/xx/sample01/sample01_tophat_out/sample01.cufflinks.out/transcripts.gtf /home/dalgaard/xx/sample02/sample02_tophat_out/sample02.cufflinks.out/transcripts.gtf The error messages is the following that it cannot find the names for the chromosomes. I really appreciate your help! Thanks a lot. Kind regards, Kevin Dalgaard ------- cufflinks -o ./merged_asm/ -F 0.05 -g /home/dalgaard/genomes/mm9/Annotation/Genes/genes.gtf -q --overhang-tolerance 200 --library-type=transfrags -A 0.0 --min-frags-per-transfrag 0 --no-5-extend -p 8 ./merged_asm/tmp/mergeSam_file9S5P0t [bam_header_read] EOF marker is absent. [bam_header_read] invalid BAM binary header (this is not a BAM file). File ./merged_asm/tmp/mergeSam_file9S5P0t doesn't appear to be a valid BAM file, trying SAM... [21:45:58] Loading reference annotation. [21:46:02] Inspecting reads and determining fragment length distribution. Processed 26894 loci. > Map Properties: > Normalized Map Mass: 71083.00 > Raw Map Mass: 71083.00 > Fragment Length Distribution: Truncated Gaussian (default) > Default Mean: 200 > Default Std Dev: 80 [21:46:03] Assembling transcripts and estimating abundances. Processed 26412 loci. [Sun Dec 2 18:39:40 2012] Comparing against reference file /home/dalgaard/refgenome/mm9.igenes.gtf Warning: Your version of Cufflinks is not up-to-date. It is recommended that you upgrade to Cufflinks v2.0.2 to benefit from the most recent features and bug fixes (http://cufflinks.cbcb.umd.edu). Warning: couldn't find fasta record for 'chr13_random'! Warning: couldn't find fasta record for 'chr17_random'! Warning: couldn't find fasta record for 'chr1_random'! Warning: couldn't find fasta record for 'chr4_random'! Warning: couldn't find fasta record for 'chr5_random'! Warning: couldn't find fasta record for 'chr7_random'! Warning: couldn't find fasta record for 'chr8_random'! Warning: couldn't find fasta record for 'chr9_random'! Warning: couldn't find fasta record for 'chrUn_random'! Warning: couldn't find fasta record for 'chrX_random'! Warning: couldn't find fasta record for 'chrY_random'! Last edited by DonDolowy; 12-02-2012 at 01:12 PM. |
![]() |
![]() |
![]() |
#2 |
Junior Member
Location: Urbana, IL Join Date: Oct 2012
Posts: 4
|
![]()
Hello,
Did you find any answers to your couldn't find fasta record for 'chr1_random' i've run into the same problem. Thank you -Joe |
![]() |
![]() |
![]() |
#3 |
Member
Location: Freiburg Join Date: Oct 2012
Posts: 56
|
![]()
What I decided to do is to use the grep command and remove all lines containing something with "_random". That allows you to continue your analysis.
|
![]() |
![]() |
![]() |
#4 |
Member
Location: UK Join Date: Aug 2013
Posts: 31
|
![]()
Hello, which file did you remove words containing '_random' from, and how exactly do you do this with a grep command?
Thanks Alex |
![]() |
![]() |
![]() |
#5 |
Junior Member
Location: Shenzhen Join Date: Oct 2011
Posts: 4
|
![]()
I think it is because the chr in the gtf you used as '-g' is different from that in the genome fasta file. Maybe you can check the 'chr name' of these two files, by grep "_random" gtf/fa.
To solve this problem, you can remove all the transcripts which associated with chr*_random in the gtf, then try to do the analysis again. |
![]() |
![]() |
![]() |
#6 |
Member
Location: UK Join Date: Aug 2013
Posts: 31
|
![]()
Thanks, that did remove some, but not all, of the error lines. And couldn't these be important sequences that we are grepping?
Alex |
![]() |
![]() |
![]() |
#7 | |
Junior Member
Location: Shenzhen Join Date: Oct 2011
Posts: 4
|
![]() Quote:
So the best way is make sure that the ref gtf and your analysis pipeline are using the same version of genome to locate the transcripts or do the alignment. You can download the mouse genome here http://hgdownload.cse.ucsc.edu/downloads.html#mouse from UCSC, which could possibly solve the problem. |
|
![]() |
![]() |
![]() |
#8 |
Member
Location: Freiburg Join Date: Oct 2012
Posts: 56
|
![]()
I just find it odd that if you download a certain iGenome "package" (e.g. UCSC mm9) that then the genome.fa and genes.gtf do not correspond and you get this error.
Personally, I have just removed all lines containing "random". If I got it correctly, chr1_random just means that when the genome got assembled, sequences were mapped to chromosome 1 but it is not known specifically where on chromosome 1 they go. Maybe they are repetitive sequences. |
![]() |
![]() |
![]() |
#9 |
Devon Ryan
Location: Freiburg, Germany Join Date: Jul 2011
Posts: 3,480
|
![]()
Well, it's odd that the iGenomes files don't always correspond, the error itself makes sense. I wouldn't recommend removing the *_random lines from either a the reference or the annotation. Those sequences/features are actually in the genome, so leaving them out will bias alignment a bit (the magnitude of this effect is likely fairly small, of course).
|
![]() |
![]() |
![]() |
Thread Tools | |
|
|