Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Cufflinks memory usage

    Hi all,
    I am having trouble with running Cufflinks on PE RNA-Seq libraries generated from HiSeq machine. I have used Tophat to successfully mapped those PE reads (about 160 millions reads), which gave me a BAM file about 6GB for each library. Then I fed the BAM file to Cufflinks running with 20 cores. Now the problem is it seems Cufflinks is taking more than 120GB ram, and is taking very long (about a week) to run one library. Have any of you had similar experience? Am I doing something wrong? Any suggestions? Thanks!

  • #2
    Which reference annotations are you using? I had similar experience with gencode annotations (>2.5 millions annotations). I then switched to RefSeq annotations and the Cufflinks runs are now much shorter. Of course, it's quantify much less potential transcripts, but for most applications, that can be sufficient.

    Comment


    • #3
      in addition to posting your reference, you may want to post which options you're utilizing in cufflinks.

      Comment


      • #4
        Right. I am also using the Gencode annotation. I think I will experiment with other annotation file to see how it goes. The options I am utilizing in Cufflinks are simply specifying the reads come from PE reads (i.e. --fr-unstranded).

        Comment


        • #5
          oscar,

          did you ever get an answer or a work around? I am running cufflinks on a similar sized bam file and am also running out of memory.

          cheers,

          Comment


          • #6
            I couldn't get the job done until upgraded to the latest version of Cufflinks which seems to use less memory. Good luck!

            Comment


            • #7
              Hey Oscar,

              I am running the newest version of Cufflinks (I literally downloaded it this week). I got Cufflinks to run on a small sub-set of reads (about 4 million paired end reads, 100nt with ~200nt inner gap). But the whole data-set is ~50X bigger. How many reads did you use? And what (approximately) was the memory usage for your file size (RAM per GB of the bam/sam file)?

              Cheers,

              Comment


              • #8
                Hi,
                I don't remember the exact numbers as it was about a year ago. One thing I do remember is I was lucky enough to utilize a machine with 1TB RAM, and I used about 500GB for about 160 million reads. I hope this helps. Good luck!

                Comment


                • #9
                  We only have one compute node with that much memory on our cluster and I didn't want to usurp it if I didn't have to. But I guess that's what the resources are for. Thanks Oscar. (Also, about how long did it take to run?)

                  Cheers,

                  Comment


                  • #10
                    Expect it to run longer than a week.

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      Strategies for Sequencing Challenging Samples
                      by seqadmin


                      Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                      03-22-2024, 06:39 AM
                    • seqadmin
                      Techniques and Challenges in Conservation Genomics
                      by seqadmin



                      The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                      Avian Conservation
                      Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                      03-08-2024, 10:41 AM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, Yesterday, 06:37 PM
                    0 responses
                    8 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, Yesterday, 06:07 PM
                    0 responses
                    8 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 03-22-2024, 10:03 AM
                    0 responses
                    49 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 03-21-2024, 07:32 AM
                    0 responses
                    66 views
                    0 likes
                    Last Post seqadmin  
                    Working...
                    X