Hi everyone,
I am working on a top hat /cufflinks differential expression pipeline and after I run through the whole pipeline, the resulting gene_exp.diff file does not contain any gene names. Also, there are about 13000 records in the transcript file, but the resulting diff file only contains about 2000. The rest of the entries are all CUFF identifiers. Following is my pipeline, transcript file and diff output. Any help is appreciated.
Tophat:
Cufflinks:
Assembly1.txt file:
Cuffmerge:
CuffDiff:
Transcript.gtf file downloaded from ucsc:
output gene_exp.diff file:
There are some NM ids that show up in the file but like I said, there are only about 2000 of them out of about 13000. Some cuffs should actually be in annotated since the transcriptome has it. For example, CUFF.21464 in the above file is a Tyr gene which is very well annotated in ucsc but it shows up with CUFF identifier. What am I doing wrong? How can I get this pipeline to include the gene names/other annotations?
Please also feel free to comment on the pipeline. This is for zebrafish reads.
Thank you in advance.
I am working on a top hat /cufflinks differential expression pipeline and after I run through the whole pipeline, the resulting gene_exp.diff file does not contain any gene names. Also, there are about 13000 records in the transcript file, but the resulting diff file only contains about 2000. The rest of the entries are all CUFF identifiers. Following is my pipeline, transcript file and diff output. Any help is appreciated.
Tophat:
Code:
tophat -p 16 -r 175 --no-coverage-search -o $Path/run1/nacre/ --transcriptome-index=/transcriptome/ucsc/zv9_transcriptome /genomes/bwt2/danRer7 /fastq_files/nacre_R1_filtered.fastq /fastq_files/nacre_R2_filtered.fastq tophat -p 16 -r 175 --no-coverage-search -o $Path/run1/tub/ --transcriptome-index=/transcriptome/ucsc/zv9_transcriptome /genomes/bwt2/danRer7 /fastq_files/tub_R1_filtered.fastq /fastq_files/tub_R2_filtered.fastq
Code:
nohup cufflinks -o $Path/run1/nacre/cuff1 -g /transcriptome/ucsc/zv9_transcriptome.gtf -p 16 $Path/run1/nacre/accepted_hits.bam nohup cufflinks -o $Path/run1/tub/cuff1 -g /transcriptome/ucsc/zv9_transcriptome.gtf -p 16 $Path/run1/tub/accepted_hits.bam
Code:
$path/tophat_run/full_test_runs/run1/nacre/cuff1/transcripts.gtf $path/tophat_run/full_test_runs/run1/tub/cuff1/transcripts.gtf
Code:
cuffmerge -o $path/run1/cuff_merge/cuff1 -g /scratchLocal/sac2026/transcriptome/ucsc/zv9_transcriptome.gtf -p 16 -s /genomes/bwt2/danRer7.fa $path/run1/assembly1.txt &
Code:
cuffdiff -o $path/run1/cuff_diff/cuff1/ -L nacre,tub -p 8 $path/run1/cuff_merge/cuff1/transcripts.gtf $path/run1/nacre/accepted_hits.bam $path/run1/tub/accepted_hits.bam
Code:
chr1 danRer7_refGene start_codon 50322025 50322027 0.000000 + . gene_id "NM_131426"; transcript_id "NM_131426"; chr1 danRer7_refGene CDS 50322025 50322231 0.000000 + 0 gene_id "NM_131426"; transcript_id "NM_131426"; chr1 danRer7_refGene exon 50321634 50322231 0.000000 + . gene_id "NM_131426"; transcript_id "NM_131426"; chr1 danRer7_refGene CDS 50323685 50323751 0.000000 + 0 gene_id "NM_131426"; transcript_id "NM_131426"; chr1 danRer7_refGene exon 50323685 50323751 0.000000 + . gene_id "NM_131426"; transcript_id "NM_131426"; chr1 danRer7_refGene CDS 50327723 50327850 0.000000 + 2 gene_id "NM_131426"; transcript_id "NM_131426"; chr1 danRer7_refGene exon 50327723 50327850 0.000000 + . gene_id "NM_131426"; transcript_id "NM_131426"; chr1 danRer7_refGene CDS 50376642 50376774 0.000000 + 0 gene_id "NM_131426"; transcript_id "NM_131426"; chr1 danRer7_refGene exon 50376642 50376774 0.000000 + . gene_id "NM_131426"; transcript_id "NM_131426"; chr1 danRer7_refGene CDS 50384689 50384782 0.000000 + 2 gene_id "NM_131426"; transcript_id "NM_131426"; chr1 danRer7_refGene exon 50384689 50384782 0.000000 + . gene_id "NM_131426"; transcript_id "NM_131426"; chr1 danRer7_refGene CDS 50384996 50385109 0.000000 + 1 gene_id "NM_131426"; transcript_id "NM_131426"; chr1 danRer7_refGene exon 50384996 50385109 0.000000 + . gene_id "NM_131426"; transcript_id "NM_131426"; chr1 danRer7_refGene CDS 50387282 50387444 0.000000 + 1 gene_id "NM_131426"; transcript_id "NM_131426"; chr1 danRer7_refGene exon 50387282 50387444 0.000000 + . gene_id "NM_131426"; transcript_id "NM_131426"; chr1 danRer7_refGene CDS 50388022 50388129 0.000000 + 0 gene_id "NM_131426"; transcript_id "NM_131426"; chr1 danRer7_refGene exon 50388022 50388129 0.000000 + . gene_id "NM_131426"; transcript_id "NM_131426"; chr1 danRer7_refGene CDS 50392531 50392579 0.000000 + 0 gene_id "NM_131426"; transcript_id "NM_131426"; chr1 danRer7_refGene exon 50392531 50392579 0.000000 + . gene_id "NM_131426"; transcript_id "NM_131426"; chr1 danRer7_refGene CDS 50393548 50393579 0.000000 + 2 gene_id "NM_131426"; transcript_id "NM_131426"; chr1 danRer7_refGene stop_codon 50393580 50393582 0.000000 + . gene_id "NM_131426"; transcript_id "NM_131426"; chr1 danRer7_refGene exon 50393548 50393588 0.000000 + . gene_id "NM_131426"; transcript_id "NM_131426"; chr1 danRer7_refGene exon 50409290 50410568 0.000000 + . gene_id "NM_131426"; transcript_id "NM_131426"; chr1 danRer7_refGene stop_codon 58701201 58701203 0.000000 - . gene_id "NM_001110522"; transcript_id "NM_001110522"; chr1 danRer7_refGene CDS 58701204 58701468 0.000000 - 1 gene_id "NM_001110522"; transcript_id "NM_001110522"; chr1 danRer7_refGene exon 58701201 58701468 0.000000 - . gene_id "NM_001110522"; transcript_id "NM_001110522";
Code:
CUFF.21460 CUFF.21460 - chr15:42401169-42414185 nacre tub OK 0.30098 0.192342 -0.645988 0.93529 0.349639 0.999981 no CUFF.21461 CUFF.21461 - chr15:42517544-42517876 nacre tub OK 0.303951 0.0349624 -3.11996 0.710738 0.477247 0.999981 no CUFF.21462 CUFF.21462 - chr15:42593781-42597957 nacre tub OK 1.06523 1.85185 0.797809 -1.28757 0.197895 0.999981 no CUFF.21463 CUFF.21463 - chr15:42567449-42568700 nacre tub NOTEST 0.0441381 0.0433716 -0.0252743 0.0151731 0.987894 1 no CUFF.21464 CUFF.21464 - chr15:42572428-42593418 nacre tub OK 2.26891 18.0882 2.99498 -6.08449 1.1686e-09 1.9611e-06 yes CUFF.21465 CUFF.21465 - chr15:42624106-42624606 nacre tub OK 2.78658 2.24085 -0.314451 0.375988 0.706925 0.999981 no CUFF.21466 CUFF.21466 - chr15:41251756-41266370 nacre tub OK 0.819343 1.03169 0.332465 -0.386342 0.699243 0.999981 no CUFF.21467 CUFF.21467 - chr15:41999382-42013139 nacre tub OK 0.13403 0.484079 1.85268 -1.61461 0.106394 0.999981 no CUFF.21468 CUFF.21468 - chr15:42636714-42637489 nacre tub OK 0.245696 0.00871635 -4.81701 1.12025 0.262609 0.999981 no CUFF.21469 CUFF.21469 - chr15:41251756-41266370 nacre tub OK 0.120829 0.186014 0.622448 -0.179106 0.857854 0.999981 no CUFF.2147 CUFF.2147 - 19:6835973-6925393 nacre tub NOTEST 0 0 0 0 1 1 no CUFF.21470 CUFF.21470 - chr15:41999382-42013139 nacre tub NOTEST 0.0487298 0.0200532 -1.28098 0.244489 0.806852 1 no CUFF.21471 CUFF.21471 - chr15:42663333-42663506 nacre tub OK 0.264006 23.4892 6.47528 -1.43214 0.152105 0.999981 no CUFF.21472 CUFF.21472 - chr15:41478958-41496849 nacre tub OK 68.4197 60.2869 -0.182566 0.416749 0.676862 0.999981 no
There are some NM ids that show up in the file but like I said, there are only about 2000 of them out of about 13000. Some cuffs should actually be in annotated since the transcriptome has it. For example, CUFF.21464 in the above file is a Tyr gene which is very well annotated in ucsc but it shows up with CUFF identifier. What am I doing wrong? How can I get this pipeline to include the gene names/other annotations?
Please also feel free to comment on the pipeline. This is for zebrafish reads.
Thank you in advance.
Comment