Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to speedup Cuffdiff ?? It is taking forever !!!

    I have a machine with 16 cores, 64GB memory and 16TB hard drive.

    I have a very large RNASeq data set to analyse:
    25 Cases (100 millions reads in each file).
    25 Controls (100 millions reads in each file).

    The cufflinks and cuffmerge part are all done.

    The cuffdiff is given me some problems because when I use more than one thread (i.e,-p *> 1), I run out of memory after 1 day.

    I am running cuffdiff with -p 1, and the job started 3 weeks ago and still running.

    How can I speed up the process ? Or what are the other solutions ?

    Can I split my files per chromosomes and run different analysis ? If so will my results be usable ?

    Can I ask cuffdiff to write some partial results ?

    Can I ask cuffdiff to compute juste the gene expression levels and discard the other results ?


    Thanks for any advise.
    Alpha

  • #2
    Supposedly, Cufflinks-2.2.0 introduced a new workflow. You can now run cuffquant to estimate transcript abundance for each sample before running cuffdiff, which speeds up the process and solves some runtime issues. However, I have encountered some minor issues with the output of cuffnorm. You can check one of my posts about it in this forum(also posted the problem on Google Group), but so far no feedback from other users.
    If you only care about examining differences between your two groups then it shouldn't be much of a problem.

    Comment


    • #3
      Thanks for the quick reply, I will run cuffquant/ cuffnorm and cuffquant / cuffdiiff and let you know if everything went well.

      Cheers,
      Alpha

      Comment


      • #4
        In terms of speed, cuffquant made the difference between me being able to use Cufflinks or not. I tried to use cuffdiff a while back on my data and it was looking like it would take around a month or so on 12 cores. Now with cuffquant, its more like overnight. And once you’ve run cuffquant, you can rerun cuffdiff very quickly, since you only have to generate those cxb files once.

        Comment


        • #5
          Try the --no-diff argument to cuffdiff
          http://cufflinks.cbcb.umd.edu/manual.html#cuffdiff
          I can't see your original command line
          But if you didn't specify labels with -L or comma delimit your list of case and control bams
          then it will be doing pairwise DE tests for all against all samples
          and this is likely to be the slowest step

          Comment


          • #6
            Hello,
            Here is an example of my command line:
            I have a lot of bash variables.
            $cuffdiff -o ${output_path_diff} -b ${genomeIndex} -p 1 -L TEST_ALL,CONTROL_ALL -u ${merged_gtf} $bam14,$bam16,$bam26,$bam28,$bam30,$bam34,$bam36,$bam40,$bam42,$bam44,$bam46,$bam48,$bam50,$bam52,$bam54,$bam56,$bam58,$bam60,$bam64,$bam66,$bam68,$bam117,$bam118,$bam119,$bam32 $bam2,$bam4,$bam6,$bam70,$bam72,$bam74,$bam76,$bam78,$bam80,$bam82,$bam84,$bam86,$bam90,$bam92,$bam94,$bam96,$bam98,$bam100,$bam102,$bam104,$bam108,$bam110,$bam112,$bam114,$bam116

            Cheers,
            Alpha

            Comment


            • #7
              That looks ok to me
              as long as the line breaks are actually spaces

              I'd definitely try the new cufflinks workflow to see if it reduces the ram usage by splitting up the tasks
              i.e. tophat > cufflinks > cuffmerge > cuffquant > cuffdiff
              but you are a supplying a huge amount of data, it's going to need a lot of memory

              As a comparator i've got a cuffdiff running with 32 threads on 72 bams (average size 5GB) and that is using 90GB of ram

              I predict, based on progress from the verbose (-v) output, that it'll take 6 days for my job to finish, that doesn't bode well for your analysis runtime

              Comment


              • #8
                Also if you just want expression values per sample then omit the cuffdiff
                You can always do your own DE testing in R etc

                Comment


                • #9
                  New cufflinks workflow compared to old
                  cuffnorm outputs expression values from the CXB files generated by cuffquant
                  then you could do your own testing on the output

                  http://cufflinks.cbcb.umd.edu/

                  Comment


                  • #10
                    Thanks jeales,
                    I am using the new version of cufflinks, the cuffquant is done. I am running the cuffdiff part.
                    I am testing on different servers I have access too to speedup the process.
                    I will let you know the results and computation time and ressources soon.

                    Cheers,
                    Alpha

                    Comment


                    • #11
                      That's a great news from alpha.

                      I do have a suggestion, I think as the process goes out of memory and your RAM size is less (64gb). Try creating a tmp folder in your server hard drive and give command input of the tmp folder while running the analysis.

                      Comment


                      • #12
                        Thanks vishnuamaram
                        I will try this solution. I was still trying to run cuffdiff with all my datasets, I only can run it with 1 cpu and it's a very long process.
                        Since I have 100 samples, 4 conditions (25 samples/ condition) and the samples in a condition are not replicates, cuffdiff is not the best!!
                        Do you have any suggestion for that ?
                        For now I am exploring another idea : writting a R script with DESeq and use the cuffnorm results do to my diff expression analysis.

                        Alpha

                        Comment


                        • #13
                          Hello vishnuamaram,
                          I realize that cufflinks programs don't have a parameter for tmp folder !!!
                          How can i manage to make it work ?

                          Alpha

                          Comment


                          • #14
                            Cuffquant takes a long time

                            Hi all,
                            I have a problem about running cuffquant, when I didn't use the option '-b/--frag-bias-correct <genome.fa>', I can got results fast. However if I add that option, it always got stuck at a processing percentage and seems taking forever.

                            I also tried to use the old pipeline, when running the cuffdiff it also takes forever. I searched online and found that in the annotation file, removing the line whose 3rd feature is 'gene' can increase the speed. I did that, but the speed didn't increase that much. Does anyone know what is the possible issue? Thanks.

                            Comment


                            • #15
                              Hello shangzhong0619,

                              Here is the parameters of cuffquant:

                              General Options:
                              -o/--output-dir write all output files to this directory [ default: ./ ]
                              -M/--mask-file ignore all alignment within transcripts in this file [ default: NULL ]
                              -b/--frag-bias-correct use bias correction - reference fasta required [ default: NULL ]
                              -u/--multi-read-correct use 'rescue method' for multi-reads [ default: FALSE ]
                              -p/--num-threads number of threads used during quantification [ default: 1 ]
                              --library-type Library prep used for input reads [ default: below ]

                              Advanced Options:
                              -m/--frag-len-mean average fragment length (unpaired reads only) [ default: 200 ]
                              -s/--frag-len-std-dev fragment length std deviation (unpaired reads only) [ default: 80 ]
                              -c/--min-alignment-count minimum number of alignments in a locus for testing [ default: 10 ]
                              --max-mle-iterations maximum iterations allowed for MLE calculation [ default: 5000 ]
                              -v/--verbose log-friendly verbose processing (no progress bar) [ default: FALSE ]
                              -q/--quiet log-friendly quiet processing (no progress bar) [ default: FALSE ]
                              --seed value of random number generator seed [ default: 0 ]
                              --no-update-check do not contact server to check for update availability[ default: FALSE ]
                              --max-bundle-frags maximum fragments allowed in a bundle before skipping [ default: 500000 ]
                              --max-frag-multihits Maximum number of alignments allowed per fragment [ default: unlim ]
                              --no-effective-length-correction No effective length correction [ default: FALSE ]
                              --no-length-correction No length correction [ default: FALSE ]


                              I suggest you to change some default parameters, like --max-bundle-frags to 50000.

                              Cheers,
                              Alpha

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Current Approaches to Protein Sequencing
                                by seqadmin


                                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                04-04-2024, 04:25 PM
                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 04-11-2024, 12:08 PM
                              0 responses
                              18 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 10:19 PM
                              0 responses
                              22 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 09:21 AM
                              0 responses
                              16 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-04-2024, 09:00 AM
                              0 responses
                              47 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X