Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • no cuffcompare .stats with more than 6 samples

    Hi All,

    I'm having a weird issue with the output from cuffcompare, and I wondered if anyone had seen the same.

    We are using cuffcompare to comapre and track transcripts build from an RNAseq project. We use the following commandline to run cuffcompare:
    cuffcompare

    Code:
      cuffcompare -o example.cuffcompare 
                        -s /ifs/mirror/genomes/bowtie/hg19.fa
                         -r <( gunzip < refcoding.gtf.gz)  
                       ./ctmpL7zs9K/example1.gtf.gz
                       ./ctmpL7zs9K/example2.gtf.gz &> example.cuffcompare.log
    According to the cuffcompare manual, cuffcompare is supposed to output a <outprefix>.stats file. We never get this file, although usually we do get the stats in a file name just <outprefix> (e.g. in this case example.cuffcompare). However, this only happens when there are 6 samples or fewer. If we use 7 or more samples, not stats are output anywhere that we can find.

    I'm running cuffcompare 2.0.2 using precompiled binaries on 64bit linux. But i've tried with 2.0.0, 1.4.0 and 1.3.0. Same result each time.

    Anyone else seen this? We'd really quite like the stats as our pipeline uses them for quality control.

    Cheers,

    Ian
    ---

  • #2
    Are you talking about something that looks like the following output file? I deleted some of the following because the project is still on-going but you should be able to see that I have over 6 samples.

    Code:
    # Cuffcompare v2.0.2 | Command line was:
    #cuffcompare -o proj123 -r westerm/(deleted)/References/genes.gtf ../001664/tophat_with_gtf/transcripts.gtf ../001665/tophat_with_gtf/transcripts.gtf ../001666/tophat_with_gtf/transcripts.gtf ../001667/tophat_with_gtf/transcripts.gtf ../001668/tophat_with_gtf/transcripts.gtf ../001669/tophat_with_gtf/transcripts.gtf ../001670/tophat_with_gtf/transcripts.gtf ../001671/tophat_with_gtf/transcripts.gtf ../001672/tophat_with_gtf/transcripts.gtf ../001673/tophat_with_gtf/transcripts.gtf 
    #
    
     Total union super-loci across all input datasets: 109817 
      (12469 multi-transcript, ~1.2 transcripts per locus)

    Comment


    • #3
      Yes, exactly.

      One would normally expect to see a whole load of stats about the run,

      something like:

      Code:
      # Cuffcompare v2.0.2 | Command line was:
      #cuffcompare -o tax-Pre-agg.cuffcompare -s /ifs/mirror/genomes/bowtie/hg19.fa -r /dev/fd/63 ./ctmpL7zs9K/tax-Pre-R1.gtf.gz ./ctmpL7zs9K/tax-Pre-R2.gtf.gz ./ctmpL7zs9K/tax-Pre-R3.gtf.gz ./ctmpL7zs9K/tax-Pre-R4.gtf.gz ./ctmpL7zs9K/tax-Pre-R5.gtf.gz ./ctmpL7zs9K/tax-Pre-R6.gtf.gz
      #
      
      #= Summary for dataset: ./ctmpL7zs9K/tax-Pre-R1.gtf.gz :
      #     Query mRNAs :  116810 in  110584 loci  (21299 multi-exon transcripts)
      #            (5204 multi-transcript loci, ~1.1 transcripts per locus)
      # Reference mRNAs :   76280 in   19285 loci  (74791 multi-exon)
      # Corresponding super-loci:          13596
      #--------------------|   Sn   |  Sp   |  fSn |  fSp    
              Base level: 	 43.3	 24.2	  - 	  -                    
              Exon level: 	 24.1	 36.5	 26.6	 40.3                 
            Intron level: 	 37.5	 95.3	 38.1	 96.7      
      Intron chain level: 	  6.8	 23.8	 12.7	 44.6
        Transcript level: 	  0.0	  0.0	  0.1	  0.0
             Locus level: 	 23.8	  4.2	 29.7	  5.3
      
      Matching intron chains:    5073
               Matching loci:    4584
      
                Missed exons:  152593/320767	( 47.6%)
                 Novel exons:   90245/211449	( 42.7%)
              Missed introns:  122705/246111	( 49.9%)
               Novel introns:    2149/96936	(  2.2%)
                 Missed loci:    5528/19285	( 28.7%)
                  Novel loci:   60885/110584	( 55.1%)
      
      [COLOR="Red"][B][I]############<SNIP> one table like this for each sample </SNIP>###########[/I][/B][/COLOR]
      
       Total union super-loci across all input datasets: 191274 
        (11845 multi-transcript, ~2.7 transcripts per locus)

      Comment


      • #4
        I generally do not look at the stats file -- although you've convinced me that I should do so more often. Looking at my projects over the past several months reveals the following. Each project section has the project name, the cuffcompare version, the number of samples in the project and the number of 'Base level' lines in the stat file. If this latter number is the same as the number of samples then a full report was generated.

        Project #1
        v1.3.0
        9
        9
        Project #2
        v2.0.0
        5
        5
        Project #3
        v1.3.0
        6
        6
        Project #4
        v2.0.2
        12
        0
        Project #5
        v2.0.2
        12
        0
        Project #6
        v2.0.2
        6
        6
        Project #7
        v2.0.2
        12
        0
        Project #8
        v2.0.2
        8
        0
        Project #9
        v2.0.2
        12
        0
        Project #10
        v2.0.2
        14
        0
        So, to me, it does look like a change in cuffcompare v.2.0.2. However I only have one pre-2.0.2 and greater than 6 sample project to back up this statement. Also you said you have tested this with other versions ... so ... I don't know what is happening. I do see the same effect as you do, at least for v.2.0.2.

        Comment


        • #5
          Originally posted by sudders View Post

          If we use 7 or more samples, not stats are output anywhere that we can find.

          ---
          Did you ever find the error/solution?

          Comment


          • #6
            No sorry, i've just given up using the provided stats on such large projects.

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Essential Discoveries and Tools in Epitranscriptomics
              by seqadmin


              The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist on Modified Bases...
              Yesterday, 07:01 AM
            • seqadmin
              Current Approaches to Protein Sequencing
              by seqadmin


              Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
              04-04-2024, 04:25 PM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 04-11-2024, 12:08 PM
            0 responses
            39 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 10:19 PM
            0 responses
            41 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 09:21 AM
            0 responses
            35 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-04-2024, 09:00 AM
            0 responses
            55 views
            0 likes
            Last Post seqadmin  
            Working...
            X