Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • cufflinks segmentation fault and taking very long to run

    I have run cuff merge with Cufflinks 2.2.0 and I used my merged.gtf file to run cuffdiff ( you can see the command I used at the end of the email). However, cuffdiff seemed to take extremely long time after the "Testing for differential expression and regulation in locus" step and came up with the following error:

    "[18:10:49] Testing for differential expression and regulation in locus.
    > Processing Locus chr1:68024242-68025896 [ ] 3%Segmentation fault (core dumped)

    To get to this point it ran for over a week.

    Are you familiar with this error or what can I do as an alternative? I have tried running it twice and both times it comes with this error.
    Your input is very appreciated!!!!

    Thank you!
    Dessi

  • #2
    When faced with a mysterious bug such as this, I always recommend starting by updating the software.

    If you're still experiencing the same problem with Cufflinks 2.2.1, post again.

    There are several other possible causes. If Cufflinks is stuck at a specific locus again, I would check the alignment file in the vicinity of that locus. I've had one case where an extremely high number of reads at a given locus stumped Cufflinks. I resolved the problem by identifying the problematic region in IGV, and masking that region from Cufflinks.

    Also, you did not post your command.

    2.2.1 release - 5/5/2014
    This issue fixes several bugs:
    Cuffnorm was not sometimes permuting replicate numbering, leading to inconsistent expression calls between Cuffnorm and Cuffdiff.
    The contrast file parser had a problem that could crash Cuffdiff
    Several Cuffnorm output files had minor output formatting issues

    Last edited by blancha; 06-26-2014, 03:48 PM.

    Comment


    • #3
      Thank you, blancha!

      I am re-running the samples with cufflinks 2.2.1 now. It is again taking forever on the "Testing for differential expression and regulation in locus" step. It has ran for 2 days now and it is still on chromosome 1 and is 0% completed.
      > Processing Locus chr1:16105773-16133734 [ ] 0%-

      Should I run cuffquant first? Some people on the forum think that would speed up the process.
      I also forgot to mention I ran cuffmerge with a doctored gtf file, would that make cuffdiff slower?
      My command is

      qsub -b y -cwd -j y -N Mousecuffdiff -pe smp 8 -V cuffdiff -p 16 -L YN,YL,ON,OL -b /home/desmla/Mousegenome/GRCm38.p2.genome.fa -u /home/desmla/140416_analysis_PFC_RNAseq/Cuffmerge/merged.gtf /home/desmla/140416_analysis_PFC_RNAseq/YN_trimgalore/tophat/1.asd.bam,/home/desmla/140416_analysis_PFC_RNAseq/YN_trimgalore/tophat/5.asd.bam,/home/desmla/140416_analysis_PFC_RNAseq/YN_trimgalore/tophat/12.asd.bam /home/desmla/140416_analysis_PFC_RNAseq/YL_trimgalore/tophat/10.asd.bam,/home/desmla/140416_analysis_PFC_RNAseq/YL_trimgalore/tophat/4.asd.bam,/home/desmla/140416_analysis_PFC_RNAseq/YL_trimgalore/tophat/7.asd.bam /home/desmla/140416_analysis_PFC_RNAseq/ON_trimgalore/tophat/11.asd.bam,/home/desmla/140416_analysis_PFC_RNAseq/ON_trimgalore/tophat/2.asd.bam,/home/desmla/140416_analysis_PFC_RNAseq/ON_trimgalore/tophat/6.asd.bam /home/desmla/140416_analysis_PFC_RNAseq/OL_trimgalore/tophat/3.asd.bam,/home/desmla/140416_analysis_PFC_RNAseq/OL_trimgalore/tophat/8.asd.bam,/home/desmla/140416_analysis_PFC_RNAseq/OL_trimgalore/tophat/9.asd.bam


      Thank you!

      Comment


      • #4
        Sorry. I don't have any brilliant insights.

        The new Cuffdiff workflow, running Cuffquant first, is definitely much faster, and preferable when analyzing more than a few samples. Still, Cuffdiff should not hang indefinitely, even when using the old workflow.

        You're running Cuffdiff with 16 processors. Are you really getting 16 processors from the queue manager? It's not clear for me from your job submission command that you are, but the queue on your server may be set up differently from mine.

        I don't know what you mean by doctored GTF, so I can't comment on whether that could be cause of the problem.

        Others on the forum may have more helpful comments.

        Comment


        • #5
          Sorry to barge in on a post, but I am having real difficulty running cuffquant. I have 100 RNAseq samples, ~60M PE reads aligned. Alignment was done with STAR (using cufflinks specific mode), then I run cufflinks and cuffmerge - all without a glitch. Now I am trying to run cuffquant with the merged.gtf sample. Cuffquant takes forever (>24 hours for some samples, on 8 4G cores). It usually gets ok to 99% and then gets stuck for a long long time. I would really appreciate any advice....
          command line is:
          cuffquant -p 8 -o output -u -b path/to/genome/seq path/to/merged.gtf path/to/bam

          Yehudit
          Yu

          Comment


          • #6
            Hi all,
            Thanks for the replies. I haven't tried the cuffquant yet but a bioinformatician in my institute suggested that for cuffdiff I should use the accepted_hits.bam files from the tophat output instead the asd.bam files which contains all mapped reads from accepted hits + properly tagged unmapped reads and is sorted and duplicate tagged. What is the opinion in this forum?

            Comment


            • #7
              Hi Dessi, from experience - cufflinks is very picky about the tags in bam file. I only saw it work with TopHat out put or STAR if STAR was run in cufflinks compatible mode. I think, both of these only have aligned reads in bam file.

              Yu
              Yu

              Comment


              • #8
                Hi Blancha,

                I re-run the cuffdiff with Cufflinks 2.2.1 release and it came up with a segmentation fault error again, again at 3% but at different chromosome 1 locus. As it is taking several days to get to this point I wonder if it is memory problem? I am using my institute cluster and I honestly don't understand much about it but is it possible somehow the submitted job has allocated memory and runs out of memory?

                I don't know what I can do, it is really frustrating.

                Regards,
                Dessi

                Comment


                • #9
                  Hi again,

                  Sorry, I was incorrect before, the job always terminates at "Processing Locus chr1:68024242-68025896"

                  So could it be something at this locus? How did you mask a problematic region from Cufflinks?

                  Thanks!

                  Comment


                  • #10
                    I've posted the command at the end of the post to mask certain regions with a GTF file.
                    It's not the first troubleshooting step I would take though.

                    First, for each job you submit to the queue, you should be getting a log from the queue manager specifying which resources you requested, and which resources you used. This would be the first thing I would check. Are you really getting 16 processors? Are you exceeding the memory limit?

                    Second, I would open the BAM files with IGV and check visually if there are any problems in the regions where Cufflinks blocks. You'll need to index the BAM files with samtools index first to be able to do this. If you have extremely abundant reads aligning to that region, I would mask it.

                    These are just general tips which you may or may not find helpful. I'm not yet able to pinpoint exactly the cause of the issues you are experiencing.

                    ------

                    -M/--mask-file <mask.(gtf/gff)>

                    Tells Cufflinks to ignore all reads that could have come from transcripts in this GTF file. We recommend including any annotated rRNA, mitochondrial transcripts other abundant transcripts you wish to ignore in your analysis in this file. Due to variable efficiency of mRNA enrichment methods and rRNA depletion kits, masking these transcripts often improves the overall robustness of transcript abundance estimates.

                    Comment


                    • #11
                      Hi blancha,

                      Just an update on my efforts, I haven't solved the mysterious problem yet.

                      First, I did check with the administrator and I am getting 8 processors and I am not exceeding the memory limit.

                      Second, I did check on IGV several of my 12 samples, I looked at the bam files ( asd.bam files which contain all mapped reads from tophat accepted_hits.bam + properly tagged unmapped reads and is sorted and duplicate tagged). In the problematic chr1:68024242-68025896 locus I see no reads aligned whatsoever. The IGV has not listed a refseq gene in this locus and the nearest Erbb4 gene has many reads aligned to it. The UCSC genome browser has listed a predicted gene from Ensemble 75.

                      Third, I tried running cuffquant on one of my samples with Cufflinks 2.2.1 and it was running quicker than cuffdiff but stopped at the exactly same locus with the same segmentation fault, core dumped error message.

                      I have also tried re-running cuffdiff with the accepted_hits.bam output from tophat and again the same error came up.

                      To be honest I am really desperate now and I just wish I could get my differentially expressed genes.

                      Any ideas?

                      Comment


                      • #12
                        I find it odd to request 8 processors from the queue manager, and then run Cufflinks with 16 threads, -p 16.

                        Couldn't this cause a problem, or am I missing something here?

                        Comment


                        • #13
                          Hi blancha, it is because I am very new in bioinformatics and I don't always know what I am doing. I have also run the command with requesting 8 processors and running cufflinks with 8 threads. I always get the error...

                          Comment


                          • #14
                            I hope I am not offending anyone by posting on this forum knowing I am a total beginner but i am trying really hard to learn and it is not easy without having the basics. I opened my first terminal window 3 months ago. There isn't a single source teaching you how to do RNA-seq analysis and I am doing a lot of websearch and asking lots of questions. If anyone knows useful websites for beginners, I will appreciate if you let me know.
                            Thank you,
                            Dessi

                            Comment


                            • #15
                              Hi, have you tried running cuffdiff or cuffquant with "accepted_hits.bam" files?

                              And how did you generate the "asd.bam" files anyway?

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM
                              • seqadmin
                                Techniques and Challenges in Conservation Genomics
                                by seqadmin



                                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                Avian Conservation
                                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                03-08-2024, 10:41 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, Yesterday, 06:37 PM
                              0 responses
                              12 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, Yesterday, 06:07 PM
                              0 responses
                              10 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-22-2024, 10:03 AM
                              0 responses
                              51 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-21-2024, 07:32 AM
                              0 responses
                              68 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X