Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Problems Counting SAM files with dexseq_count.py

    Hi all,

    I'm having problems counting my SAM files with "dexseq_counts.py". I have counted them before with a custom generated gtf file (I used Stringtie to generate it) and it was fine. Now I'm trying to use a new version of the gff file, where i have done some clean up, removing single exon transcripts. My problems is that when I try to count my files with the new gff file I get the following error:

    Traceback (most recent call last):
    File "dexseq_count.py", line 97, in <module>
    features[f.iv] += f
    File "_HTSeq.pyx", line 524, in HTSeq._HTSeq.GenomicArray.__getitem__ (src/_HTSeq.c:10761)
    File "_HTSeq.pyx", line 384, in HTSeq._HTSeq.ChromVector.__getitem__ (src/_HTSeq.c:7809)
    File "_HTSeq.pyx", line 348, in HTSeq._HTSeq.ChromVector._create_view (src/_HTSeq.c:7382)
    IndexError: Cannot subset to zero-length interval.


    This is my command:
    python dexseq_count.py GFF_test3.gff -p yes -r name -s reverse ../../trimmed/alignment/P0_1/P0_1_sort_name.sam P0_1_ExonCounts

    And this is how my gff file looks like:

    1 dexseq_prepare_annotation.py exonic_part 312155 312383 . + . transcripts TCONS_00000001+TCONS_00000002+TCONS_00000003; exonic_part_number 001; gene_id XLOC_000001
    1 dexseq_prepare_annotation.py exonic_part 312361 312383 . + . transcripts TCONS_00000002; exonic_part_number 002; gene_id XLOC_000001
    1 dexseq_prepare_annotation.py exonic_part 315197 315479 . + . transcripts TCONS_00000001; exonic_part_number 003; gene_id XLOC_000001
    1 dexseq_prepare_annotation.py exonic_part 315933 316737 . + . transcripts TCONS_00000001+TCONS_00000004; exonic_part_number 004; gene_id XLOC_000001
    1 dexseq_prepare_annotation.py exonic_part 316355 316737 . + . transcripts TCONS_00000001; exonic_part_number 005; gene_id XLOC_000001
    1 dexseq_prepare_annotation.py exonic_part 317591 317815 . + . transcripts TCONS_00000001; exonic_part_number 006; gene_id XLOC_000001
    1 dexseq_prepare_annotation.py exonic_part 321633 321756 . + . transcripts TCONS_00000001; exonic_part_number 007; gene_id XLOC_000001
    1 dexseq_prepare_annotation.py exonic_part 322916 323800 . + . transcripts TCONS_00000001; exonic_part_number 008; gene_id XLOC_000001
    1 dexseq_prepare_annotation.py exonic_part 382921 383126 . + . transcripts TCONS_00000005+TCONS_00000006; exonic_part_number 009; gene_id XLOC_000001
    1 dexseq_prepare_annotation.py exonic_part 383127 383127 . + . transcripts TCONS_00000005; exonic_part_number 010; gene_id XLOC_000001

    Since I haven't changed my SAM files or anything in my computer, I'm guessing my problem is in the gff. I thought it could be due to exons with just 1 nt, but i had more of those in the original file before and that wasn't a problem before. Anyone has an idea what could be causing the error? I'd appreciate any help I can get. I'm very lost right now.
    Thank you very much,

    Miriam

  • #2
    I would actually guess that you have a 0-length entry. A simple way to debug things like this is to subset the gff file until the problem goes away. You can then quickly find the region that's causing this and manually have a look.

    Comment

    Latest Articles

    Collapse

    • seqadmin
      Essential Discoveries and Tools in Epitranscriptomics
      by seqadmin




      The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
      04-22-2024, 07:01 AM
    • seqadmin
      Current Approaches to Protein Sequencing
      by seqadmin


      Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
      04-04-2024, 04:25 PM

    ad_right_rmr

    Collapse

    News

    Collapse

    Topics Statistics Last Post
    Started by seqadmin, 04-11-2024, 12:08 PM
    0 responses
    59 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 04-10-2024, 10:19 PM
    0 responses
    57 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 04-10-2024, 09:21 AM
    0 responses
    51 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 04-04-2024, 09:00 AM
    0 responses
    55 views
    0 likes
    Last Post seqadmin  
    Working...
    X