Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • speed up the cuffdiff calculation

    Dear Community,

    I have 12 sets of RNAseq data (4 groups) and mapped them using STAR. The resultant bam files ranged from 4G to 13 G. I then run cuffdiff on those bam files to get the FPKMs and differential genes. The process has been on-going for over a day and the log file still stopped at " Inspecting maps and determining fragment length distributions". I wonder if the cuffdiff has limit about the sample size? Is it normal this slow?

    Thanks a lot for any inputs!

    C.

  • #2
    Hi Capricy,

    I recently used cuffdiff on 12 samples (bam sizes ~2Gb each) and it took 5-6 hours. This was using a server with 128Gb and 20 threads.

    I'd say your process does seem a bit slow, although your bam files are larger than mine. How much memory / threads are you using? I wonder if you could use the 'top' command to check if the program is still actually running?

    Cheers,

    Matt.

    Comment


    • #3
      Hi, Matt,

      Thank you very much for reply.

      I am running on HPC. I used 96G mem, 40 processor. They are still running..., and last night I started to see the output for file: var_model.info

      I wonder if the uneven file size would be the issue.

      Not sure how long it would take to actually finish.

      C.
      Last edited by capricy; 11-29-2017, 03:55 AM.

      Comment


      • #4
        hmm, well at least it's not just hanging!

        Not sure why it's taking that long. If you've given cuffdiff all the threads '-p 40', I'd have thought that would be plenty. Maybe someone else has a better idea?

        Matt.

        Comment


        • #5
          After 8 theads the speedup for cufflinks/diff is marginal...

          From my experience the speed up of the cufflinks/cuffdiff is marginal after 8 threads...

          In some cases the runtime with 32-48 threads may be way longer than with 8-16, esp on systems with 4+ CPU sockets due to bottlenecks caused by memory interconnects saturation/latencies.

          Also make sure the system/program is using NUMA properly and cpu interleaving is not set in the BIOS setup.

          For tophat/cufflinks I would run several jobs using 1-8 threads in parallel than one job at a time using 40 threads in series (provided enough ram is available).

          PS: And be patient... - leave job running overnight/weekend/Christmas Holiday :-)

          Comment


          • #6
            Thank you very much for advice about bringing down the value for -p.

            I will try that with larger memory.

            Actually all my jobs are hanging at:

            ChkbCpt1b
            > Processing Locus chr15:100479569-100495239 [******************** ] 81%Methig1
            Mettl7a2
            Methig1
            Mettl7a2
            > Processing Locus chr15:100469033-100479252 [******************** ] 81%Methig1
            Mettl7a2
            Methig1
            Mettl7a2
            > Processing Locus chr15:103562759-103565081 [******************** ] 81%

            I am working on mouse data. I use mm10 gtf as reference.

            C.

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Recent Advances in Sequencing Analysis Tools
              by seqadmin


              The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
              Yesterday, 07:48 AM
            • seqadmin
              Essential Discoveries and Tools in Epitranscriptomics
              by seqadmin




              The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
              04-22-2024, 07:01 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, Yesterday, 07:17 AM
            0 responses
            11 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 05-02-2024, 08:06 AM
            0 responses
            19 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-30-2024, 12:17 PM
            0 responses
            20 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-29-2024, 10:49 AM
            0 responses
            28 views
            0 likes
            Last Post seqadmin  
            Working...
            X