Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • speed up the cuffdiff calculation

    Dear Community,

    I have 12 sets of RNAseq data (4 groups) and mapped them using STAR. The resultant bam files ranged from 4G to 13 G. I then run cuffdiff on those bam files to get the FPKMs and differential genes. The process has been on-going for over a day and the log file still stopped at " Inspecting maps and determining fragment length distributions". I wonder if the cuffdiff has limit about the sample size? Is it normal this slow?

    Thanks a lot for any inputs!

    C.

  • #2
    Hi Capricy,

    I recently used cuffdiff on 12 samples (bam sizes ~2Gb each) and it took 5-6 hours. This was using a server with 128Gb and 20 threads.

    I'd say your process does seem a bit slow, although your bam files are larger than mine. How much memory / threads are you using? I wonder if you could use the 'top' command to check if the program is still actually running?

    Cheers,

    Matt.

    Comment


    • #3
      Hi, Matt,

      Thank you very much for reply.

      I am running on HPC. I used 96G mem, 40 processor. They are still running..., and last night I started to see the output for file: var_model.info

      I wonder if the uneven file size would be the issue.

      Not sure how long it would take to actually finish.

      C.
      Last edited by capricy; 11-29-2017, 03:55 AM.

      Comment


      • #4
        hmm, well at least it's not just hanging!

        Not sure why it's taking that long. If you've given cuffdiff all the threads '-p 40', I'd have thought that would be plenty. Maybe someone else has a better idea?

        Matt.

        Comment


        • #5
          After 8 theads the speedup for cufflinks/diff is marginal...

          From my experience the speed up of the cufflinks/cuffdiff is marginal after 8 threads...

          In some cases the runtime with 32-48 threads may be way longer than with 8-16, esp on systems with 4+ CPU sockets due to bottlenecks caused by memory interconnects saturation/latencies.

          Also make sure the system/program is using NUMA properly and cpu interleaving is not set in the BIOS setup.

          For tophat/cufflinks I would run several jobs using 1-8 threads in parallel than one job at a time using 40 threads in series (provided enough ram is available).

          PS: And be patient... - leave job running overnight/weekend/Christmas Holiday :-)

          Comment


          • #6
            Thank you very much for advice about bringing down the value for -p.

            I will try that with larger memory.

            Actually all my jobs are hanging at:

            ChkbCpt1b
            > Processing Locus chr15:100479569-100495239 [******************** ] 81%Methig1
            Mettl7a2
            Methig1
            Mettl7a2
            > Processing Locus chr15:100469033-100479252 [******************** ] 81%Methig1
            Mettl7a2
            Methig1
            Mettl7a2
            > Processing Locus chr15:103562759-103565081 [******************** ] 81%

            I am working on mouse data. I use mm10 gtf as reference.

            C.

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Strategies for Sequencing Challenging Samples
              by seqadmin


              Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
              03-22-2024, 06:39 AM
            • seqadmin
              Techniques and Challenges in Conservation Genomics
              by seqadmin



              The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

              Avian Conservation
              Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
              03-08-2024, 10:41 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, Yesterday, 06:37 PM
            0 responses
            12 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, Yesterday, 06:07 PM
            0 responses
            10 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 03-22-2024, 10:03 AM
            0 responses
            51 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 03-21-2024, 07:32 AM
            0 responses
            68 views
            0 likes
            Last Post seqadmin  
            Working...
            X