Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • speed up the cuffdiff calculation

    Dear Community,

    I have 12 sets of RNAseq data (4 groups) and mapped them using STAR. The resultant bam files ranged from 4G to 13 G. I then run cuffdiff on those bam files to get the FPKMs and differential genes. The process has been on-going for over a day and the log file still stopped at " Inspecting maps and determining fragment length distributions". I wonder if the cuffdiff has limit about the sample size? Is it normal this slow?

    Thanks a lot for any inputs!

    C.

  • #2
    Hi Capricy,

    I recently used cuffdiff on 12 samples (bam sizes ~2Gb each) and it took 5-6 hours. This was using a server with 128Gb and 20 threads.

    I'd say your process does seem a bit slow, although your bam files are larger than mine. How much memory / threads are you using? I wonder if you could use the 'top' command to check if the program is still actually running?

    Cheers,

    Matt.

    Comment


    • #3
      Hi, Matt,

      Thank you very much for reply.

      I am running on HPC. I used 96G mem, 40 processor. They are still running..., and last night I started to see the output for file: var_model.info

      I wonder if the uneven file size would be the issue.

      Not sure how long it would take to actually finish.

      C.
      Last edited by capricy; 11-29-2017, 03:55 AM.

      Comment


      • #4
        hmm, well at least it's not just hanging!

        Not sure why it's taking that long. If you've given cuffdiff all the threads '-p 40', I'd have thought that would be plenty. Maybe someone else has a better idea?

        Matt.

        Comment


        • #5
          After 8 theads the speedup for cufflinks/diff is marginal...

          From my experience the speed up of the cufflinks/cuffdiff is marginal after 8 threads...

          In some cases the runtime with 32-48 threads may be way longer than with 8-16, esp on systems with 4+ CPU sockets due to bottlenecks caused by memory interconnects saturation/latencies.

          Also make sure the system/program is using NUMA properly and cpu interleaving is not set in the BIOS setup.

          For tophat/cufflinks I would run several jobs using 1-8 threads in parallel than one job at a time using 40 threads in series (provided enough ram is available).

          PS: And be patient... - leave job running overnight/weekend/Christmas Holiday :-)

          Comment


          • #6
            Thank you very much for advice about bringing down the value for -p.

            I will try that with larger memory.

            Actually all my jobs are hanging at:

            ChkbCpt1b
            > Processing Locus chr15:100479569-100495239 [******************** ] 81%Methig1
            Mettl7a2
            Methig1
            Mettl7a2
            > Processing Locus chr15:100469033-100479252 [******************** ] 81%Methig1
            Mettl7a2
            Methig1
            Mettl7a2
            > Processing Locus chr15:103562759-103565081 [******************** ] 81%

            I am working on mouse data. I use mm10 gtf as reference.

            C.

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Current Approaches to Protein Sequencing
              by seqadmin


              Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
              04-04-2024, 04:25 PM
            • seqadmin
              Strategies for Sequencing Challenging Samples
              by seqadmin


              Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
              03-22-2024, 06:39 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 04-11-2024, 12:08 PM
            0 responses
            18 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 10:19 PM
            0 responses
            22 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 09:21 AM
            0 responses
            17 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-04-2024, 09:00 AM
            0 responses
            49 views
            0 likes
            Last Post seqadmin  
            Working...
            X