Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Negative interval length in GFF file

    I have been given some GFF files to process, and one of these is throwing an error when I attempt to run it through HTSeq-count:

    Error occured when processing GFF file (line 26 of file ./features/promoter.gff):
    Cannot subset to zero-length interval.
    [Exception type: IndexError, raised in _HTSeq.pyx:348]

    Indeed, on line 26, the start value is higher than the end value for the interval.

    Code:
    C20524	flankbed	promoter	1101	1100	.	-	.	ID=CGI_10000033;
    Has anyone encountered this before?
    Should I trust the quality of this GFF file or is something terribly wrong?
    How would you deal with this problem?

    Thanks in advance...

  • #2
    See if it occurs more often and if it is always on the same strand, if so just turn them around.

    Comment


    • #3
      Be careful with that GFF, it is weird to have a promoter region with a 1 base long (1100 - 1101)

      Comment


      • #4
        There are many such lines, with very short intervals! Yes, I'm very worried.
        Unfortunately I do not know how to make gff files myself. I obtained them from the public resource at this address, where it is explained how they were made:


        Since I am completely unfamiliar with Python, I do not understand how the files were generated.

        Comment


        • #5
          but what do you have to do? maybe there is an easier way to do it, instead using IGV

          Do you have to analyze an unannotated genome?

          Comment


          • #6
            The integrated genomics viewer will not be sufficient for me; I have to perform calculations. The genome has been annotated, and many resources are already available.
            If you think there is something strange in the gff file, perhaps there is a way to clean it up? I have very little experience with sequence data, so I need some experts' advice.

            Comment


            • #7
              If the genome is annotated then it is probably that it is available in UCSC or Ensembl. In these repositories you could get a lot of curated genomic features, such us, repeat regions, TSSs, coding regions, intron, exons, promoters, etc, in many formats.

              Biomart Ensembl website


              UCSC tables

              Comment


              • #8
                There is a big different between predicted annotations and ones that are vetted by a human (which is why finishing a genome is so expensive).

                You may want to contact the lab that generated the files (http://faculty.washington.edu/sr320/?page_id=551) and ask them about this discrepancy.

                Comment


                • #9
                  Aha! I found the most recent versions on Ensembl. Thank you for pointing me in the right direction.

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Current Approaches to Protein Sequencing
                    by seqadmin


                    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                    04-04-2024, 04:25 PM
                  • seqadmin
                    Strategies for Sequencing Challenging Samples
                    by seqadmin


                    Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                    03-22-2024, 06:39 AM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, 04-11-2024, 12:08 PM
                  0 responses
                  25 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 10:19 PM
                  0 responses
                  28 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 09:21 AM
                  0 responses
                  24 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-04-2024, 09:00 AM
                  0 responses
                  52 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X