Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • gffread segmentation fault

    Hello All,

    I donno what is going wrong with the gffread utility from cufflinks.

    This is the command I have been using

    gffread -w /path/test.fa -g /path/hg19_ucsc.fa /path/test.gtf
    My GTF file is just 4MB with around 48K records. The output fasta file prints only around 42K records with a size of 480MB and then I see this error.

    /path: line 16: 28068 Segmentation fault (core dumped) gffread -w /path/test.fa -g /path/hg19_ucsc.fa /path/test.gtf
    I have another GTF file of size 12MB which is being used to generate a fasta file of size 3.7GB and then it generates the same segmentation error.

    All helps are appreciated.

  • #2
    Fragment your gtf file into five separate chunks and run each with gffread. it may help you unserstand the core of the fault.

    Comment


    • #3
      In the third file, I am getting the segmentation fault error. Could you please help me what it means?

      Comment


      • #4
        Originally posted by gokhulkrishnakilaru View Post
        In the third file, I am getting the segmentation fault error. Could you please help me what it means?
        A program is represented as segments in a computer's memory. A segmentation fault means that the program tried to access an address that is not located in one of its segments.

        Usually, this means that there is a bug in the software.

        Comment


        • #5
          Originally posted by seb567 View Post
          A program is represented as segments in a computer's memory. A segmentation fault means that the program tried to access an address that is not located in one of its segments.

          Usually, this means that there is a bug in the software.
          I broke my large file into chunks of 10000 lines each which left me with 10 files. The program runs fine for 9 files and throws this error for the last file.

          Any thoughts?

          Comment


          • #6
            Originally posted by gokhulkrishnakilaru View Post
            I broke my large file into chunks of 10000 lines each which left me with 10 files. The program runs fine for 9 files and throws this error for the last file.

            Any thoughts?
            Is there something unusual in the last compared to the nine others ?

            Comment


            • #7
              Originally posted by seb567 View Post
              Is there something unusual in the last compared to the nine others ?
              I am trying to figure out those things. I broke the last file into 1000 line chunks now. And all of them are giving a segmentation fault. I cross checked the end coordinate against the chrom sizes files and all of them seem to be within the limit.

              I don't know whats wrong with it.

              Comment


              • #8
                I have had the same problem with the filtering parameters of gffread (-J, -V -H etc). Can anyone suggest another program that does a similar thing to gffread? That is, filter transcripts based on CDS features and provide a multi-fasta format sequence file at the end?

                Comment


                • #9
                  Hello all,

                  I had the same problem a while ago. I fragment my file into 10 and I got errors in two of them. In both cases I broke the files into smaller ones and (after more splitting) I found located the problem in two a gene with alternative splicing event. In my case the secondary transcript was bigger than the annotated gene sequence (the last exon coordinates were placed outside the gene).

                  I just erased the two features from my original file (I considered that it was not a great loss for my purposes) and it works fine now.

                  Hope it helps,

                  Pablo

                  Comment


                  • #10
                    Hi,
                    Thanks for your reply Pablo, I tried doing what you said and I broke down my files and kept getting the segfault even when the file was only 100 lines long(!). I think maybe I have a lot of alternate splicing in my organism (a basidiomycete (fungi)). Does anyone know of a program like gffread that can handle alternate splicing? Or another way I could get around this problem?

                    Thanks very much

                    Will

                    Comment


                    • #11
                      I'm having the same problem as Will. I've done the tophat2-cufflinks-cuffmerge pipeline to generate a merged GTF file. However, my organism has fairly high gene density, so cufflinks is predicting very long transcripts, which are not correct. I wanted to filter the merged GTF file using gffread to discard any transcripts that have internal stops (either the -V or -J parameter). However, I keep getting the 'segmentation_fault' error. I have tried to break up the merged GTF file into smaller sizes (such as 1000 lines), however the segmentation error persists. Does anybody know a solution to this problem?

                      Comment

                      Latest Articles

                      Collapse

                      • seqadmin
                        Essential Discoveries and Tools in Epitranscriptomics
                        by seqadmin


                        The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist on Modified Bases...
                        Yesterday, 07:01 AM
                      • seqadmin
                        Current Approaches to Protein Sequencing
                        by seqadmin


                        Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                        04-04-2024, 04:25 PM

                      ad_right_rmr

                      Collapse

                      News

                      Collapse

                      Topics Statistics Last Post
                      Started by seqadmin, 04-11-2024, 12:08 PM
                      0 responses
                      45 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 04-10-2024, 10:19 PM
                      0 responses
                      46 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 04-10-2024, 09:21 AM
                      0 responses
                      39 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 04-04-2024, 09:00 AM
                      0 responses
                      55 views
                      0 likes
                      Last Post seqadmin  
                      Working...
                      X