Problems Counting SAM files with dexseq_count.py

ml439

Junior Member

Join Date: May 2015

Posts: 9
- Share
- Tweet
#1

Problems Counting SAM files with dexseq_count.py

09-17-2015, 08:43 AM

Hi all,

I'm having problems counting my SAM files with "dexseq_counts.py". I have counted them before with a custom generated gtf file (I used Stringtie to generate it) and it was fine. Now I'm trying to use a new version of the gff file, where i have done some clean up, removing single exon transcripts. My problems is that when I try to count my files with the new gff file I get the following error:

Traceback (most recent call last):
File "dexseq_count.py", line 97, in <module>
features[f.iv] += f
File "_HTSeq.pyx", line 524, in HTSeq._HTSeq.GenomicArray.__getitem__ (src/_HTSeq.c:10761)
File "_HTSeq.pyx", line 384, in HTSeq._HTSeq.ChromVector.__getitem__ (src/_HTSeq.c:7809)
File "_HTSeq.pyx", line 348, in HTSeq._HTSeq.ChromVector._create_view (src/_HTSeq.c:7382)
IndexError: Cannot subset to zero-length interval.

This is my command:
python dexseq_count.py GFF_test3.gff -p yes -r name -s reverse ../../trimmed/alignment/P0_1/P0_1_sort_name.sam P0_1_ExonCounts

And this is how my gff file looks like:

1 dexseq_prepare_annotation.py exonic_part 312155 312383 . + . transcripts TCONS_00000001+TCONS_00000002+TCONS_00000003; exonic_part_number 001; gene_id XLOC_000001
1 dexseq_prepare_annotation.py exonic_part 312361 312383 . + . transcripts TCONS_00000002; exonic_part_number 002; gene_id XLOC_000001
1 dexseq_prepare_annotation.py exonic_part 315197 315479 . + . transcripts TCONS_00000001; exonic_part_number 003; gene_id XLOC_000001
1 dexseq_prepare_annotation.py exonic_part 315933 316737 . + . transcripts TCONS_00000001+TCONS_00000004; exonic_part_number 004; gene_id XLOC_000001
1 dexseq_prepare_annotation.py exonic_part 316355 316737 . + . transcripts TCONS_00000001; exonic_part_number 005; gene_id XLOC_000001
1 dexseq_prepare_annotation.py exonic_part 317591 317815 . + . transcripts TCONS_00000001; exonic_part_number 006; gene_id XLOC_000001
1 dexseq_prepare_annotation.py exonic_part 321633 321756 . + . transcripts TCONS_00000001; exonic_part_number 007; gene_id XLOC_000001
1 dexseq_prepare_annotation.py exonic_part 322916 323800 . + . transcripts TCONS_00000001; exonic_part_number 008; gene_id XLOC_000001
1 dexseq_prepare_annotation.py exonic_part 382921 383126 . + . transcripts TCONS_00000005+TCONS_00000006; exonic_part_number 009; gene_id XLOC_000001
1 dexseq_prepare_annotation.py exonic_part 383127 383127 . + . transcripts TCONS_00000005; exonic_part_number 010; gene_id XLOC_000001

Since I haven't changed my SAM files or anything in my computer, I'm guessing my problem is in the gff. I thought it could be due to exons with just 1 nt, but i had more of those in the original file before and that wasn't a problem before. Anyone has an idea what could be causing the error? I'd appreciate any help I can get. I'm very lost right now.
Thank you very much,

Miriam
Tags: custom gtf, dexseq_count.py, error
dpryan

Devon Ryan

Join Date: Jul 2011

Posts: 3478
- Share
- Tweet
#2

09-17-2015, 10:43 AM

I would actually guess that you have a 0-length entry. A simple way to debug things like this is to subset the gff file until the problem goes away. You can then quickly find the region that's causing this and manually have a look.
Comment

Previous template Next

Essential Discoveries and Tools in Epitranscriptomics

by seqadmin

The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
- Channel: Articles
04-22-2024, 07:01 AM
Current Approaches to Protein Sequencing

by seqadmin

Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
- Channel: Articles
04-04-2024, 04:25 PM

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 59 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 57 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 51 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 55 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Problems Counting SAM files with dexseq_count.py

Comment

Latest Articles

ad_right_rmr

News