Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • sagarc88
    Junior Member
    • Jul 2012
    • 2

    tophat/cufflinks no gene names or annotations showing up

    Hi everyone,

    I am working on a top hat /cufflinks differential expression pipeline and after I run through the whole pipeline, the resulting gene_exp.diff file does not contain any gene names. Also, there are about 13000 records in the transcript file, but the resulting diff file only contains about 2000. The rest of the entries are all CUFF identifiers. Following is my pipeline, transcript file and diff output. Any help is appreciated.

    Tophat:
    Code:
    tophat -p 16 -r 175 --no-coverage-search -o $Path/run1/nacre/ --transcriptome-index=/transcriptome/ucsc/zv9_transcriptome /genomes/bwt2/danRer7 /fastq_files/nacre_R1_filtered.fastq /fastq_files/nacre_R2_filtered.fastq
    
    tophat -p 16 -r 175 --no-coverage-search -o $Path/run1/tub/ --transcriptome-index=/transcriptome/ucsc/zv9_transcriptome /genomes/bwt2/danRer7 /fastq_files/tub_R1_filtered.fastq /fastq_files/tub_R2_filtered.fastq
    Cufflinks:
    Code:
    nohup cufflinks -o $Path/run1/nacre/cuff1 -g /transcriptome/ucsc/zv9_transcriptome.gtf -p 16 $Path/run1/nacre/accepted_hits.bam
    
    nohup cufflinks -o $Path/run1/tub/cuff1 -g /transcriptome/ucsc/zv9_transcriptome.gtf -p 16 $Path/run1/tub/accepted_hits.bam
    Assembly1.txt file:
    Code:
    $path/tophat_run/full_test_runs/run1/nacre/cuff1/transcripts.gtf
    $path/tophat_run/full_test_runs/run1/tub/cuff1/transcripts.gtf
    Cuffmerge:
    Code:
    cuffmerge -o $path/run1/cuff_merge/cuff1 -g /scratchLocal/sac2026/transcriptome/ucsc/zv9_transcriptome.gtf -p 16 -s /genomes/bwt2/danRer7.fa $path/run1/assembly1.txt &
    CuffDiff:
    Code:
    cuffdiff -o $path/run1/cuff_diff/cuff1/ -L nacre,tub -p 8 $path/run1/cuff_merge/cuff1/transcripts.gtf $path/run1/nacre/accepted_hits.bam $path/run1/tub/accepted_hits.bam
    Transcript.gtf file downloaded from ucsc:
    Code:
    chr1    danRer7_refGene start_codon     50322025        50322027        0.000000        +       .       gene_id "NM_131426"; transcript_id "NM_131426"; 
    chr1    danRer7_refGene CDS     50322025        50322231        0.000000        +       0       gene_id "NM_131426"; transcript_id "NM_131426"; 
    chr1    danRer7_refGene exon    50321634        50322231        0.000000        +       .       gene_id "NM_131426"; transcript_id "NM_131426"; 
    chr1    danRer7_refGene CDS     50323685        50323751        0.000000        +       0       gene_id "NM_131426"; transcript_id "NM_131426"; 
    chr1    danRer7_refGene exon    50323685        50323751        0.000000        +       .       gene_id "NM_131426"; transcript_id "NM_131426"; 
    chr1    danRer7_refGene CDS     50327723        50327850        0.000000        +       2       gene_id "NM_131426"; transcript_id "NM_131426"; 
    chr1    danRer7_refGene exon    50327723        50327850        0.000000        +       .       gene_id "NM_131426"; transcript_id "NM_131426"; 
    chr1    danRer7_refGene CDS     50376642        50376774        0.000000        +       0       gene_id "NM_131426"; transcript_id "NM_131426"; 
    chr1    danRer7_refGene exon    50376642        50376774        0.000000        +       .       gene_id "NM_131426"; transcript_id "NM_131426"; 
    chr1    danRer7_refGene CDS     50384689        50384782        0.000000        +       2       gene_id "NM_131426"; transcript_id "NM_131426"; 
    chr1    danRer7_refGene exon    50384689        50384782        0.000000        +       .       gene_id "NM_131426"; transcript_id "NM_131426"; 
    chr1    danRer7_refGene CDS     50384996        50385109        0.000000        +       1       gene_id "NM_131426"; transcript_id "NM_131426"; 
    chr1    danRer7_refGene exon    50384996        50385109        0.000000        +       .       gene_id "NM_131426"; transcript_id "NM_131426"; 
    chr1    danRer7_refGene CDS     50387282        50387444        0.000000        +       1       gene_id "NM_131426"; transcript_id "NM_131426"; 
    chr1    danRer7_refGene exon    50387282        50387444        0.000000        +       .       gene_id "NM_131426"; transcript_id "NM_131426"; 
    chr1    danRer7_refGene CDS     50388022        50388129        0.000000        +       0       gene_id "NM_131426"; transcript_id "NM_131426"; 
    chr1    danRer7_refGene exon    50388022        50388129        0.000000        +       .       gene_id "NM_131426"; transcript_id "NM_131426"; 
    chr1    danRer7_refGene CDS     50392531        50392579        0.000000        +       0       gene_id "NM_131426"; transcript_id "NM_131426"; 
    chr1    danRer7_refGene exon    50392531        50392579        0.000000        +       .       gene_id "NM_131426"; transcript_id "NM_131426"; 
    chr1    danRer7_refGene CDS     50393548        50393579        0.000000        +       2       gene_id "NM_131426"; transcript_id "NM_131426"; 
    chr1    danRer7_refGene stop_codon      50393580        50393582        0.000000        +       .       gene_id "NM_131426"; transcript_id "NM_131426"; 
    chr1    danRer7_refGene exon    50393548        50393588        0.000000        +       .       gene_id "NM_131426"; transcript_id "NM_131426"; 
    chr1    danRer7_refGene exon    50409290        50410568        0.000000        +       .       gene_id "NM_131426"; transcript_id "NM_131426"; 
    chr1    danRer7_refGene stop_codon      58701201        58701203        0.000000        -       .       gene_id "NM_001110522"; transcript_id "NM_001110522"; 
    chr1    danRer7_refGene CDS     58701204        58701468        0.000000        -       1       gene_id "NM_001110522"; transcript_id "NM_001110522"; 
    chr1    danRer7_refGene exon    58701201        58701468        0.000000        -       .       gene_id "NM_001110522"; transcript_id "NM_001110522";
    output gene_exp.diff file:
    Code:
    CUFF.21460      CUFF.21460      -       chr15:42401169-42414185 nacre   tub     OK      0.30098 0.192342        -0.645988       0.93529 0.349639        0.999981        no
    CUFF.21461      CUFF.21461      -       chr15:42517544-42517876 nacre   tub     OK      0.303951        0.0349624       -3.11996        0.710738        0.477247        0.999981        no
    CUFF.21462      CUFF.21462      -       chr15:42593781-42597957 nacre   tub     OK      1.06523 1.85185 0.797809        -1.28757        0.197895        0.999981        no
    CUFF.21463      CUFF.21463      -       chr15:42567449-42568700 nacre   tub     NOTEST  0.0441381       0.0433716       -0.0252743      0.0151731       0.987894        1       no
    CUFF.21464      CUFF.21464      -       chr15:42572428-42593418 nacre   tub     OK      2.26891 18.0882 2.99498 -6.08449        1.1686e-09      1.9611e-06      yes
    CUFF.21465      CUFF.21465      -       chr15:42624106-42624606 nacre   tub     OK      2.78658 2.24085 -0.314451       0.375988        0.706925        0.999981        no
    CUFF.21466      CUFF.21466      -       chr15:41251756-41266370 nacre   tub     OK      0.819343        1.03169 0.332465        -0.386342       0.699243        0.999981        no
    CUFF.21467      CUFF.21467      -       chr15:41999382-42013139 nacre   tub     OK      0.13403 0.484079        1.85268 -1.61461        0.106394        0.999981        no
    CUFF.21468      CUFF.21468      -       chr15:42636714-42637489 nacre   tub     OK      0.245696        0.00871635      -4.81701        1.12025 0.262609        0.999981        no
    CUFF.21469      CUFF.21469      -       chr15:41251756-41266370 nacre   tub     OK      0.120829        0.186014        0.622448        -0.179106       0.857854        0.999981        no
    CUFF.2147       CUFF.2147       -       19:6835973-6925393      nacre   tub     NOTEST  0       0       0       0       1       1       no
    CUFF.21470      CUFF.21470      -       chr15:41999382-42013139 nacre   tub     NOTEST  0.0487298       0.0200532       -1.28098        0.244489        0.806852        1       no
    CUFF.21471      CUFF.21471      -       chr15:42663333-42663506 nacre   tub     OK      0.264006        23.4892 6.47528 -1.43214        0.152105        0.999981        no
    CUFF.21472      CUFF.21472      -       chr15:41478958-41496849 nacre   tub     OK      68.4197 60.2869 -0.182566       0.416749        0.676862        0.999981        no

    There are some NM ids that show up in the file but like I said, there are only about 2000 of them out of about 13000. Some cuffs should actually be in annotated since the transcriptome has it. For example, CUFF.21464 in the above file is a Tyr gene which is very well annotated in ucsc but it shows up with CUFF identifier. What am I doing wrong? How can I get this pipeline to include the gene names/other annotations?

    Please also feel free to comment on the pipeline. This is for zebrafish reads.

    Thank you in advance.
  • scatteredStorms
    Junior Member
    • Oct 2015
    • 1

    #2
    Me too

    I'm having this same problem and am somewhat surprised that it seems hard to find a solution. I was thinking maybe one option is to search by some other thing such as the chromosome location to salvage the data. It is very bad because cufflinks takes so long to run! It is much longer than STAR.

    Comment

    • dblyons
      Junior Member
      • Apr 2011
      • 5

      #3
      have you tried running a few of your bam files from tophat directly into cuffdiff with no local transcriptome assembly (i.e. skip cufflinks and cuffmerge)? this might at least get you some data to look at while you sort the cufflinks problem out. also what does your cuffmerge'd transcripts file look like? How many NM records are there?

      I think if the goal is to get an idea of expression from known loci, you may be able to skip the de novo transcriptome assembly. The CUFF annotations are being generated where there are new transcripts found, but which may (as you say) be very similar to existing transcripts in the zebrafish gtf. You could use bedtools to rename your CUFF transcripts with the original name, based on a percentage of overlap and shared strand, etc.

      Comment

      Latest Articles

      Collapse

      • SEQadmin2
        Nine Things a Sample Prep Scientist Thinks About Before Sequencing
        by SEQadmin2


        I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.


        Here are nine questions we think about, in roughly the order they matter, before...
        Yesterday, 07:11 AM
      • SEQadmin2
        From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
        by SEQadmin2


        Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


        The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
        ...
        06-02-2026, 10:05 AM
      • SEQadmin2
        Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
        by SEQadmin2


        With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


        Introduction

        Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
        05-22-2026, 06:42 AM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by SEQadmin2, 06-17-2026, 06:09 AM
      0 responses
      16 views
      0 reactions
      Last Post SEQadmin2  
      Started by SEQadmin2, 06-09-2026, 11:58 AM
      0 responses
      37 views
      0 reactions
      Last Post SEQadmin2  
      Started by SEQadmin2, 06-05-2026, 10:09 AM
      0 responses
      43 views
      0 reactions
      Last Post SEQadmin2  
      Started by SEQadmin2, 06-04-2026, 08:59 AM
      0 responses
      49 views
      0 reactions
      Last Post SEQadmin2  
      Working...