Hi everyone,
I just got my first RNA-seq dataset (50bp, paired-end) and am trying to analyze it using the common top hat - cufflinks - cuffdiff way of doing it. Actually, I am using the pipeline suggested in the following Nat Prot. paper:Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks.
However, I run into some problems when I use cuffmerge.
The annotations files I use, are the one downloaded for mm9 on Tophats homepage provided by Illumina.
cuffmerge -g /home/dalgaard/genomes/mm9/Annotation/Genes/genes.gtf -s /home/dalgaard/genomes/mm9/Sequence/WholeGenomeFasta/genome.fa -p 8 assemblies.txt
Assemblies.txt contains:
/home/dalgaard/xx/sample01/sample01_tophat_out/sample01.cufflinks.out/transcripts.gtf
/home/dalgaard/xx/sample02/sample02_tophat_out/sample02.cufflinks.out/transcripts.gtf
The error messages is the following that it cannot find the names for the chromosomes.
I really appreciate your help!
Thanks a lot.
Kind regards,
Kevin Dalgaard
-------
cufflinks -o ./merged_asm/ -F 0.05 -g /home/dalgaard/genomes/mm9/Annotation/Genes/genes.gtf -q --overhang-tolerance 200 --library-type=transfrags -A 0.0 --min-frags-per-transfrag 0 --no-5-extend -p 8 ./merged_asm/tmp/mergeSam_file9S5P0t
[bam_header_read] EOF marker is absent.
[bam_header_read] invalid BAM binary header (this is not a BAM file).
File ./merged_asm/tmp/mergeSam_file9S5P0t doesn't appear to be a valid BAM file, trying SAM...
[21:45:58] Loading reference annotation.
[21:46:02] Inspecting reads and determining fragment length distribution.
Processed 26894 loci.
> Map Properties:
> Normalized Map Mass: 71083.00
> Raw Map Mass: 71083.00
> Fragment Length Distribution: Truncated Gaussian (default)
> Default Mean: 200
> Default Std Dev: 80
[21:46:03] Assembling transcripts and estimating abundances.
Processed 26412 loci.
[Sun Dec 2 18:39:40 2012] Comparing against reference file /home/dalgaard/refgenome/mm9.igenes.gtf
Warning: Your version of Cufflinks is not up-to-date. It is recommended that you upgrade to Cufflinks v2.0.2 to benefit from the most recent features and bug fixes (http://cufflinks.cbcb.umd.edu).
Warning: couldn't find fasta record for 'chr13_random'!
Warning: couldn't find fasta record for 'chr17_random'!
Warning: couldn't find fasta record for 'chr1_random'!
Warning: couldn't find fasta record for 'chr4_random'!
Warning: couldn't find fasta record for 'chr5_random'!
Warning: couldn't find fasta record for 'chr7_random'!
Warning: couldn't find fasta record for 'chr8_random'!
Warning: couldn't find fasta record for 'chr9_random'!
Warning: couldn't find fasta record for 'chrUn_random'!
Warning: couldn't find fasta record for 'chrX_random'!
Warning: couldn't find fasta record for 'chrY_random'!
I just got my first RNA-seq dataset (50bp, paired-end) and am trying to analyze it using the common top hat - cufflinks - cuffdiff way of doing it. Actually, I am using the pipeline suggested in the following Nat Prot. paper:Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks.
However, I run into some problems when I use cuffmerge.
The annotations files I use, are the one downloaded for mm9 on Tophats homepage provided by Illumina.
cuffmerge -g /home/dalgaard/genomes/mm9/Annotation/Genes/genes.gtf -s /home/dalgaard/genomes/mm9/Sequence/WholeGenomeFasta/genome.fa -p 8 assemblies.txt
Assemblies.txt contains:
/home/dalgaard/xx/sample01/sample01_tophat_out/sample01.cufflinks.out/transcripts.gtf
/home/dalgaard/xx/sample02/sample02_tophat_out/sample02.cufflinks.out/transcripts.gtf
The error messages is the following that it cannot find the names for the chromosomes.
I really appreciate your help!
Thanks a lot.
Kind regards,
Kevin Dalgaard
-------
cufflinks -o ./merged_asm/ -F 0.05 -g /home/dalgaard/genomes/mm9/Annotation/Genes/genes.gtf -q --overhang-tolerance 200 --library-type=transfrags -A 0.0 --min-frags-per-transfrag 0 --no-5-extend -p 8 ./merged_asm/tmp/mergeSam_file9S5P0t
[bam_header_read] EOF marker is absent.
[bam_header_read] invalid BAM binary header (this is not a BAM file).
File ./merged_asm/tmp/mergeSam_file9S5P0t doesn't appear to be a valid BAM file, trying SAM...
[21:45:58] Loading reference annotation.
[21:46:02] Inspecting reads and determining fragment length distribution.
Processed 26894 loci.
> Map Properties:
> Normalized Map Mass: 71083.00
> Raw Map Mass: 71083.00
> Fragment Length Distribution: Truncated Gaussian (default)
> Default Mean: 200
> Default Std Dev: 80
[21:46:03] Assembling transcripts and estimating abundances.
Processed 26412 loci.
[Sun Dec 2 18:39:40 2012] Comparing against reference file /home/dalgaard/refgenome/mm9.igenes.gtf
Warning: Your version of Cufflinks is not up-to-date. It is recommended that you upgrade to Cufflinks v2.0.2 to benefit from the most recent features and bug fixes (http://cufflinks.cbcb.umd.edu).
Warning: couldn't find fasta record for 'chr13_random'!
Warning: couldn't find fasta record for 'chr17_random'!
Warning: couldn't find fasta record for 'chr1_random'!
Warning: couldn't find fasta record for 'chr4_random'!
Warning: couldn't find fasta record for 'chr5_random'!
Warning: couldn't find fasta record for 'chr7_random'!
Warning: couldn't find fasta record for 'chr8_random'!
Warning: couldn't find fasta record for 'chr9_random'!
Warning: couldn't find fasta record for 'chrUn_random'!
Warning: couldn't find fasta record for 'chrX_random'!
Warning: couldn't find fasta record for 'chrY_random'!
Comment