Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Cufflinks 1.0.0: Major new features in assembly and differential expression

    Hi all,

    I'm extremely pleased to announce the release of Cufflinks 1.0.0. We've incorporated feedback from users here and elsewhere to make Cufflinks much more powerful and accessible. Please don't hesitate to try it out and post questions and feedback here. Highlights of the release are listed below, and further information can be found at:

    http://cufflinks.cbcb.umd.edu

    Thanks,

    --The Cufflinks Team

    ***********************

    1.0.0 release - 5/5/2011

    This release represents a huge leap for Cufflinks in terms of performance and features. It is highly recommended that all users upgrade to this version of Cufflinks. Updates and improvements include:

    * A new Reference Annotation Based Transcript (RABT) assembly mode has been added. More details can be found in the How Cufflinks Works section.
    * Major improvements to Cuffdiff. Handling of replicates in Cuffdiff have been dramatically overhauled. Cuffdiff now models fragment count overdispersion with a beta negative binomial distribution in each condition prior to testing. See the substantially updated page on How Cufflinks works for more details.
    * Bias correction described here is now enabled with the -b/--frag-bias-correct option (-r/--reference-seq is no longer in use). A path to the reference multi-fasta used in mapping must be supplied following the option.
    * Added support for improved handling multi-mapping reads. Enable with the -u/--multi-read-correct option. See How Cufflinks Works for more details.
    * Trimming has been instituted to more accurately locate the 3' ends of transcripts during assembly based on coverage.
    * Cufflinks now includes a new tool called Cuffmerge to help merge assemblies from multiple samples into a single GTF for use with Cuffdiff. The tool also helps integrate a reference annotation file. See the Getting Started page for more details.
    * Output file formats have been made consistent between Cufflinks and Cuffdiff. See the Manual for more details on the new formats.
    * Both GFF3 and GTF2.2 annotations are now fully supported as input to all programs (see here).
    * Improved reporting of map properties.
    * The programs now check for available updates automatically on launch.
    * Upper-quartile normalizaion has been fixed to be consistent with published literature (enable with -N/--upper-quartile-norm).
    * Fixed a bug where some splice-junction reads were lost in quantitation.
    * Fixed a bug where reads landing in introns were over-filtered in assembly.
    * Numerous improvements in speed for both assembly and quantitation.
    * Cuffdiff now uses dramatically less memory. Cufflinks' memory footprint has also shrunk.
    * Numerous minor bug fixes.

  • #2
    Awesome!

    This is great for many reasons. I think the new option to have Reference Annotation Based Transcript (RABT) assembly will be a really nice feature.

    Furthermore, I'm excited to see how CuffDiff handles biological replicates now when it comes to differential splicing and promoter usage because we have a lot of samples (19 vs 20 for a case and control experiment) and what was being identified in Cuffdiff previously as significant didn't really do a great job representing the variation in the RNA-seq data samples we have (it seemed like outliers could really throw the results off) when we made plots of the individual samples run through Cufflinks.

    Your use of the JS divergence metric is such a cool idea for a way to find differences when doing comparisons, so we were working on our own way of utilizing that while still taking into account the variability of our biological replicates, but if the new version of CuffDiff does that better, we're definitely excited to utilize it instead.

    Comment


    • #3
      Dear all,
      FPKM calculation with 0.9.3 and 1.0.0 gives significantly different results. The best genes on our cancer samples had FPKM of approx. 15000, now this values changed to ~1.5M. Whats more, when you sort genes.fpkm_tracking you will obtain many different regions with the same FPKM values.

      Btw, great update Cole.
      Tomasz Stokowy
      www.sequencing.io.gliwice.pl

      Comment


      • #4
        Thanks!

        I'm very curious to see how the new cuffdiff goes for you. We're doing similar analyses, but on a more diverse collection of tissues and conditions and with fewer replicates in each. I'd love to hear how cuffdiff performs in designs like yours, so please don't hesitate to contact us with questions, suggestions, or problems.

        Comment


        • #5
          I am looking forward to test this version, thanks for the news!

          One thing I noticed, the gffread program mentioned in the website to test the validation of GFF3 files is not present in the Mac/OS or Linux binaries. Any comments on that?

          Comment


          • #6
            Originally posted by berath View Post
            I am looking forward to test this version, thanks for the news!

            One thing I noticed, the gffread program mentioned in the website to test the validation of GFF3 files is not present in the Mac/OS or Linux binaries. Any comments on that?
            Sure - my comment is that I'm a bozo for not including them in the binary packages. The script that builds those packages hadn't been correctly updated. I just posted a microrelease (v1.0.1) that fixes this and several other issues.

            Comment


            • #7
              Thanks for all your awesome work Cole and team!

              cuffmerge is an awesome new feature, particularily when cuffdiff is on tap. Thanks for adding this. I found that using the stats.combined.gtf GTFs was the way to go rather than the transcripts.gtf for cuffmerge. Using the transcripts.gtf will give these errors.

              Code:
              Error: duplicate GFF ID 'SAMPLE1-T24-MEDIA.ENSG00000105855.2' encountered!
                      [FAILED]
              Error: could not execute gtf_to_sam
              Traceback (most recent call last):
                File "/packages/cufflinks/cufflinks-1.0.1/bin/cuffmerge", line 513, in ?
                  sys.exit(main())
                File "/packages/cufflinks/cufflinks-1.0.1/bin/cuffmerge", line 492, in main
                  sam_input_files = convert_gtf_to_sam(gtf_input_files)
                File "/packages/cufflinks/cufflinks-1.0.1/bin/cuffmerge", line 287, in convert_gtf_to_sam
                  sam_out = gtf_to_sam(line)
                File "/packages/cufflinks/cufflinks-1.0.1/bin/cuffmerge", line 247, in gtf_to_sam
                  exit(1)
              TypeError: 'str' object is not callable
              I assume this is as expected?

              Comment


              • #8
                Hi Cole,

                On the manual of cufflinks


                One option of cuffmerge should be -g/--ref-gtf, not -r/--ref-gtf showed on current website,
                right?

                Thank you,

                Comment


                • #9
                  Hi Cole,

                  I am trying to run cuffmerge on my data - 3 x gtf-files created by cufflinks (galaxy).
                  Now running cuffmerge on the commandline I'm using a reference gtf and fasta file + the manifest file.

                  Everything works until I reach the stage where it compares against the reference file (gtf). Then I get the error below:

                  [Wed May 11 15:09:34 2011] Comparing against reference file /Users/aneldavanderwalt/data/maize/ZmB73_5a_WGS.gtf
                  You are using Cufflinks v1.0.1, which is the most recent release.
                  No fasta index found for /Users/aneldavanderwalt/Desktop/ZmB73_RefGen_v2.fasta. Rebuilding, please wait..
                  Fasta index rebuilt.
                  Error: duplicate GFF ID 'CUFF.AC155624.2' encountered!
                  [FAILED]
                  Error: could not execute cuffcompare


                  I looked at the run log and saw this:

                  cuffcompare -o tmp_meta_asm -r /Users/aneldavanderwalt/data/maize/ZmB73_5a_WGS.gtf -s /Users/aneldavanderwalt/Desktop/ZmB73_RefGen_v2.fasta R.cuffmerge//transcripts.gtf R.cuffmerge//transcripts.gtf


                  Seems like cuffmerge is comparing the same file twice? I'm not sure if I make the right assumption or how to solve it if this is the case?

                  Any suggestions?

                  Thanks!

                  Anelda

                  Comment


                  • #10
                    Originally posted by Anelda View Post
                    Hi Cole,

                    I am trying to run cuffmerge on my data - 3 x gtf-files created by cufflinks (galaxy).
                    Now running cuffmerge on the commandline I'm using a reference gtf and fasta file + the manifest file.

                    Everything works until I reach the stage where it compares against the reference file (gtf). Then I get the error below:

                    [Wed May 11 15:09:34 2011] Comparing against reference file /Users/aneldavanderwalt/data/maize/ZmB73_5a_WGS.gtf
                    You are using Cufflinks v1.0.1, which is the most recent release.
                    No fasta index found for /Users/aneldavanderwalt/Desktop/ZmB73_RefGen_v2.fasta. Rebuilding, please wait..
                    Fasta index rebuilt.
                    Error: duplicate GFF ID 'CUFF.AC155624.2' encountered!
                    [FAILED]
                    Error: could not execute cuffcompare


                    I looked at the run log and saw this:

                    cuffcompare -o tmp_meta_asm -r /Users/aneldavanderwalt/data/maize/ZmB73_5a_WGS.gtf -s /Users/aneldavanderwalt/Desktop/ZmB73_RefGen_v2.fasta R.cuffmerge//transcripts.gtf R.cuffmerge//transcripts.gtf


                    Seems like cuffmerge is comparing the same file twice? I'm not sure if I make the right assumption or how to solve it if this is the case?

                    Any suggestions?

                    Thanks!

                    Anelda
                    What's in the assembly manifest that you provide as input to cuffmerge?

                    Comment


                    • #11
                      Cole, I am having the same problem with cuffcompare. I wonder if this is related to my previous post regarding problems with cuffmerge?

                      This is using ensembl Homo_sapiens.GRCh37.61.

                      Code:
                      ~/bin/cuffcompare \
                              -r ${REFPATH}/Homo_sapiens.GRCh37.61.gtf \
                              -s ${REFPATH}/Homo_sapiens.GRCh37.61.ensembl.fa \
                              -p SAMPLE1-T24-MEDIA \
                              -o SAMPLE1-T24-MEDIA.stats \
                              SAMPLE1-T24-MEDIA.ensembl.transcripts.gtf
                      Dies:
                      Code:
                      Error: duplicate GFF ID 'SAMPLE1-T24-MEDIA.ENSG00000105855.2' encountered!
                      I have 56 samples this experiment -- about half work fine, the other half throw these errors. Using cufflinks 1.0.1 and the patched version of cuffcompare that fixed the GFF code changes regarding the treatment of gene_id vs gene_name. I had this same problem before the patch FWIW.

                      Comment


                      • #12
                        Hi,

                        I notice that the manual states that cufflinks does not currently support SAM files with CIGAR strings using operators other than 'M' and 'N'. Is this still accurate for cufflinks 1.0.x? I've just run cufflinks on a SAM produced by tophat with --allow-indels and it completed without any errors. Should I trust this output?

                        Comment


                        • #13
                          Originally posted by Cole Trapnell View Post
                          What's in the assembly manifest that you provide as input to cuffmerge?
                          The paths to the three cufflinks gtf files

                          /Users/aneldavanderwalt/Downloads/R1.gtf
                          /Users/aneldavanderwalt/Downloads/R2.gtf
                          /Users/aneldavanderwalt/Downloads/R3.gtf


                          With a newline after the last file - each one on it's own line.

                          Let me know if you need more info?

                          Comment


                          • #14
                            Originally posted by Anelda View Post
                            The paths to the three cufflinks gtf files

                            /Users/aneldavanderwalt/Downloads/R1.gtf
                            /Users/aneldavanderwalt/Downloads/R2.gtf
                            /Users/aneldavanderwalt/Downloads/R3.gtf


                            With a newline after the last file - each one on it's own line.

                            Let me know if you need more info?
                            Oh yes, running on Mac OS x 10.6.7 - version 1.0.1 (cuffmerge)

                            Thanks,
                            Anelda

                            Comment


                            • #15
                              Hm. Very strange. Can you send us a small amount of GTF from one or more of your assemblies along with the reference GTF so that we can reproduce this? It's hard to say what's happening here, but the log looks very odd. We'll keep the data to ourselves of course, and chuck it once the bug is fixed.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Current Approaches to Protein Sequencing
                                by seqadmin


                                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                04-04-2024, 04:25 PM
                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 04-11-2024, 12:08 PM
                              0 responses
                              30 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 10:19 PM
                              0 responses
                              32 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 09:21 AM
                              0 responses
                              28 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-04-2024, 09:00 AM
                              0 responses
                              53 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X