SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
Tophat - aligning to known gene annotations whuzzy RNA Sequencing 0 02-09-2012 12:04 AM
converting UCSC gene names to Hugo Symbol names efoss Bioinformatics 2 07-16-2011 12:41 PM
tophat -G gene model annotations GTF format? silin284 Bioinformatics 15 04-21-2011 06:26 AM
tophat with a list of gene model annotations. fabrice Bioinformatics 2 10-13-2010 06:44 AM
inconsistent gene names in genes.expr - Cufflinks Boel Bioinformatics 2 04-14-2010 05:16 AM

Reply
 
Thread Tools
Old 08-02-2012, 08:03 AM   #1
sagarc88
Junior Member
 
Location: East Coast

Join Date: Jul 2012
Posts: 2
Default tophat/cufflinks no gene names or annotations showing up

Hi everyone,

I am working on a top hat /cufflinks differential expression pipeline and after I run through the whole pipeline, the resulting gene_exp.diff file does not contain any gene names. Also, there are about 13000 records in the transcript file, but the resulting diff file only contains about 2000. The rest of the entries are all CUFF identifiers. Following is my pipeline, transcript file and diff output. Any help is appreciated.

Tophat:
Code:
tophat -p 16 -r 175 --no-coverage-search -o $Path/run1/nacre/ --transcriptome-index=/transcriptome/ucsc/zv9_transcriptome /genomes/bwt2/danRer7 /fastq_files/nacre_R1_filtered.fastq /fastq_files/nacre_R2_filtered.fastq

tophat -p 16 -r 175 --no-coverage-search -o $Path/run1/tub/ --transcriptome-index=/transcriptome/ucsc/zv9_transcriptome /genomes/bwt2/danRer7 /fastq_files/tub_R1_filtered.fastq /fastq_files/tub_R2_filtered.fastq
Cufflinks:
Code:
nohup cufflinks -o $Path/run1/nacre/cuff1 -g /transcriptome/ucsc/zv9_transcriptome.gtf -p 16 $Path/run1/nacre/accepted_hits.bam

nohup cufflinks -o $Path/run1/tub/cuff1 -g /transcriptome/ucsc/zv9_transcriptome.gtf -p 16 $Path/run1/tub/accepted_hits.bam
Assembly1.txt file:
Code:
$path/tophat_run/full_test_runs/run1/nacre/cuff1/transcripts.gtf
$path/tophat_run/full_test_runs/run1/tub/cuff1/transcripts.gtf
Cuffmerge:
Code:
cuffmerge -o $path/run1/cuff_merge/cuff1 -g /scratchLocal/sac2026/transcriptome/ucsc/zv9_transcriptome.gtf -p 16 -s /genomes/bwt2/danRer7.fa $path/run1/assembly1.txt &
CuffDiff:
Code:
cuffdiff -o $path/run1/cuff_diff/cuff1/ -L nacre,tub -p 8 $path/run1/cuff_merge/cuff1/transcripts.gtf $path/run1/nacre/accepted_hits.bam $path/run1/tub/accepted_hits.bam
Transcript.gtf file downloaded from ucsc:
Code:
chr1    danRer7_refGene start_codon     50322025        50322027        0.000000        +       .       gene_id "NM_131426"; transcript_id "NM_131426"; 
chr1    danRer7_refGene CDS     50322025        50322231        0.000000        +       0       gene_id "NM_131426"; transcript_id "NM_131426"; 
chr1    danRer7_refGene exon    50321634        50322231        0.000000        +       .       gene_id "NM_131426"; transcript_id "NM_131426"; 
chr1    danRer7_refGene CDS     50323685        50323751        0.000000        +       0       gene_id "NM_131426"; transcript_id "NM_131426"; 
chr1    danRer7_refGene exon    50323685        50323751        0.000000        +       .       gene_id "NM_131426"; transcript_id "NM_131426"; 
chr1    danRer7_refGene CDS     50327723        50327850        0.000000        +       2       gene_id "NM_131426"; transcript_id "NM_131426"; 
chr1    danRer7_refGene exon    50327723        50327850        0.000000        +       .       gene_id "NM_131426"; transcript_id "NM_131426"; 
chr1    danRer7_refGene CDS     50376642        50376774        0.000000        +       0       gene_id "NM_131426"; transcript_id "NM_131426"; 
chr1    danRer7_refGene exon    50376642        50376774        0.000000        +       .       gene_id "NM_131426"; transcript_id "NM_131426"; 
chr1    danRer7_refGene CDS     50384689        50384782        0.000000        +       2       gene_id "NM_131426"; transcript_id "NM_131426"; 
chr1    danRer7_refGene exon    50384689        50384782        0.000000        +       .       gene_id "NM_131426"; transcript_id "NM_131426"; 
chr1    danRer7_refGene CDS     50384996        50385109        0.000000        +       1       gene_id "NM_131426"; transcript_id "NM_131426"; 
chr1    danRer7_refGene exon    50384996        50385109        0.000000        +       .       gene_id "NM_131426"; transcript_id "NM_131426"; 
chr1    danRer7_refGene CDS     50387282        50387444        0.000000        +       1       gene_id "NM_131426"; transcript_id "NM_131426"; 
chr1    danRer7_refGene exon    50387282        50387444        0.000000        +       .       gene_id "NM_131426"; transcript_id "NM_131426"; 
chr1    danRer7_refGene CDS     50388022        50388129        0.000000        +       0       gene_id "NM_131426"; transcript_id "NM_131426"; 
chr1    danRer7_refGene exon    50388022        50388129        0.000000        +       .       gene_id "NM_131426"; transcript_id "NM_131426"; 
chr1    danRer7_refGene CDS     50392531        50392579        0.000000        +       0       gene_id "NM_131426"; transcript_id "NM_131426"; 
chr1    danRer7_refGene exon    50392531        50392579        0.000000        +       .       gene_id "NM_131426"; transcript_id "NM_131426"; 
chr1    danRer7_refGene CDS     50393548        50393579        0.000000        +       2       gene_id "NM_131426"; transcript_id "NM_131426"; 
chr1    danRer7_refGene stop_codon      50393580        50393582        0.000000        +       .       gene_id "NM_131426"; transcript_id "NM_131426"; 
chr1    danRer7_refGene exon    50393548        50393588        0.000000        +       .       gene_id "NM_131426"; transcript_id "NM_131426"; 
chr1    danRer7_refGene exon    50409290        50410568        0.000000        +       .       gene_id "NM_131426"; transcript_id "NM_131426"; 
chr1    danRer7_refGene stop_codon      58701201        58701203        0.000000        -       .       gene_id "NM_001110522"; transcript_id "NM_001110522"; 
chr1    danRer7_refGene CDS     58701204        58701468        0.000000        -       1       gene_id "NM_001110522"; transcript_id "NM_001110522"; 
chr1    danRer7_refGene exon    58701201        58701468        0.000000        -       .       gene_id "NM_001110522"; transcript_id "NM_001110522";
output gene_exp.diff file:
Code:
CUFF.21460      CUFF.21460      -       chr15:42401169-42414185 nacre   tub     OK      0.30098 0.192342        -0.645988       0.93529 0.349639        0.999981        no
CUFF.21461      CUFF.21461      -       chr15:42517544-42517876 nacre   tub     OK      0.303951        0.0349624       -3.11996        0.710738        0.477247        0.999981        no
CUFF.21462      CUFF.21462      -       chr15:42593781-42597957 nacre   tub     OK      1.06523 1.85185 0.797809        -1.28757        0.197895        0.999981        no
CUFF.21463      CUFF.21463      -       chr15:42567449-42568700 nacre   tub     NOTEST  0.0441381       0.0433716       -0.0252743      0.0151731       0.987894        1       no
CUFF.21464      CUFF.21464      -       chr15:42572428-42593418 nacre   tub     OK      2.26891 18.0882 2.99498 -6.08449        1.1686e-09      1.9611e-06      yes
CUFF.21465      CUFF.21465      -       chr15:42624106-42624606 nacre   tub     OK      2.78658 2.24085 -0.314451       0.375988        0.706925        0.999981        no
CUFF.21466      CUFF.21466      -       chr15:41251756-41266370 nacre   tub     OK      0.819343        1.03169 0.332465        -0.386342       0.699243        0.999981        no
CUFF.21467      CUFF.21467      -       chr15:41999382-42013139 nacre   tub     OK      0.13403 0.484079        1.85268 -1.61461        0.106394        0.999981        no
CUFF.21468      CUFF.21468      -       chr15:42636714-42637489 nacre   tub     OK      0.245696        0.00871635      -4.81701        1.12025 0.262609        0.999981        no
CUFF.21469      CUFF.21469      -       chr15:41251756-41266370 nacre   tub     OK      0.120829        0.186014        0.622448        -0.179106       0.857854        0.999981        no
CUFF.2147       CUFF.2147       -       19:6835973-6925393      nacre   tub     NOTEST  0       0       0       0       1       1       no
CUFF.21470      CUFF.21470      -       chr15:41999382-42013139 nacre   tub     NOTEST  0.0487298       0.0200532       -1.28098        0.244489        0.806852        1       no
CUFF.21471      CUFF.21471      -       chr15:42663333-42663506 nacre   tub     OK      0.264006        23.4892 6.47528 -1.43214        0.152105        0.999981        no
CUFF.21472      CUFF.21472      -       chr15:41478958-41496849 nacre   tub     OK      68.4197 60.2869 -0.182566       0.416749        0.676862        0.999981        no

There are some NM ids that show up in the file but like I said, there are only about 2000 of them out of about 13000. Some cuffs should actually be in annotated since the transcriptome has it. For example, CUFF.21464 in the above file is a Tyr gene which is very well annotated in ucsc but it shows up with CUFF identifier. What am I doing wrong? How can I get this pipeline to include the gene names/other annotations?

Please also feel free to comment on the pipeline. This is for zebrafish reads.

Thank you in advance.
sagarc88 is offline   Reply With Quote
Old 10-01-2015, 07:03 AM   #2
scatteredStorms
Junior Member
 
Location: Bethesda, MD

Join Date: Oct 2015
Posts: 1
Default Me too

I'm having this same problem and am somewhat surprised that it seems hard to find a solution. I was thinking maybe one option is to search by some other thing such as the chromosome location to salvage the data. It is very bad because cufflinks takes so long to run! It is much longer than STAR.
scatteredStorms is offline   Reply With Quote
Old 10-05-2015, 11:27 AM   #3
dblyons
Junior Member
 
Location: ca

Join Date: Apr 2011
Posts: 5
Default

have you tried running a few of your bam files from tophat directly into cuffdiff with no local transcriptome assembly (i.e. skip cufflinks and cuffmerge)? this might at least get you some data to look at while you sort the cufflinks problem out. also what does your cuffmerge'd transcripts file look like? How many NM records are there?

I think if the goal is to get an idea of expression from known loci, you may be able to skip the de novo transcriptome assembly. The CUFF annotations are being generated where there are new transcripts found, but which may (as you say) be very similar to existing transcripts in the zebrafish gtf. You could use bedtools to rename your CUFF transcripts with the original name, based on a percentage of overlap and shared strand, etc.
dblyons is offline   Reply With Quote
Reply

Tags
annotation, cufflinks, gene names, pipeline, tophat

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 10:17 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO