Unconfigured Ad

**Simon Anders** · 09-06-2011, 06:45 AM

It might be because this is not a GTF file; it does not have 'gene_ID' attribute -- or rather, it has them, but they are named 'Parent'. Hence, you need the option '-i Parent'.

**Simon Anders** · 09-06-2011, 06:47 AM

Sorry, that was wrong. The GFF file contains a transcript ID in the 'Parent' attribute of 'exon' lines, but we would need the gene ID. You will need to fix this manually.

**sofia17** · 09-06-2011, 07:21 AM

so in the gff3 file I should replace entries such as

"Parent=mRNA:Solyc00g005000.2.1"

with

"ID=gene:Solyc00g005000.2.1"???

**Simon Anders** · 09-06-2011, 07:28 AM

Nearly. It should be "ID=gene:Solyc00g005000.2". Compare carefully the "mRNA" and the "gene" lines.

**sofia17** · 09-06-2011, 08:12 AM

Within a gene should I do the above replacement for ALL components, i.e. mRNA, exon, intron, CDS, three_prime_UTR, etc...? Or for mRNA lines ONLY?

**Simon Anders** · 09-06-2011, 10:23 PM

Only for the "exon" lines, because htseq-count ignores the others.

**sofia17** · 09-08-2011, 07:32 AM

I have done the replacement (please see below that all exons have the correct 'ID=gene:...' field). But it still gives same error:
Error occured in line 6 of file /Users/vanderlab1/Desktop/try/processed.gff.
Error: Feature exon:Solyc00g005000.2.1.1 does not contain a 'gene_id' attribute
Exception type: SystemExit, raised in count.py:55]

I tried also: htseq-count -i ID=gene <samFile> <gff3File>
and got the same error

Error occured in line 6 of file /Users/vanderlab1/Desktop/try/processed.gff.
Error: Feature exon:Solyc00g005000.2.1.1 does not contain a 'ID=gene' attribute
Exception type: SystemExit, raised in count.py:55]

Maybe I use wrongly the -i option?

##gff-version 3
##feature-ontology http://song.cvs.sourceforge.net/*che...?revision=1.93
##sequence-region SL2.40ch00 1 21805821
SL2.40ch00 ITAG_eugene gene 16437 18189 . + . Alias=Solyc00g005000;ID=gene:Solyc00g005000.2;Name=Solyc00g005000.2;from_BOGAS=1;length=1753
SL2.40ch00 ITAG_eugene mRNA 16437 18189 . + . ID=mRNA:Solyc00g005000.2.1;Name=Solyc00g005000.2.1;Note=Aspartic proteinase nepenthesin I (AHRD V1 **-- A9ZMF9_NEPAL)%3B contains Interpro domain(s) IPR001461 Peptidase A1 ;Ontology_term=GO:0006508;Parent=gene:Solyc00g005000.2;from_BOGAS=1;interpro2go_term=GO:0006508;length=1753;nb_exon=2
SL2.40ch00 ITAG_eugene exon 16437 17275 . + . ID=exon:Solyc00g005000.2.1.1;ID=gene:Solyc00g005000.2;from_BOGAS=1
SL2.40ch00 ITAG_eugene five_prime_UTR 16437 16479 . + . ID=five_prime_UTR:Solyc00g005000.2.1.0;Parent=mRNA:Solyc00g005000.2.1;from_BOGAS=1
SL2.40ch00 ITAG_eugene CDS 16480 17275 . + 0 ID=CDS:Solyc00g005000.2.1.1;Parent=mRNA:Solyc00g005000.2.1;from_BOGAS=1
SL2.40ch00 ITAG_eugene intron 17276 17335 . + . ID=intron:Solyc00g005000.2.1.1;Parent=mRNA:Solyc00g005000.2.1;from_BOGAS=1
SL2.40ch00 ITAG_eugene exon 17336 18189 . + 0 ID=exon:Solyc00g005000.2.1.2;ID=gene:Solyc00g005000.2;from_BOGAS=1
SL2.40ch00 ITAG_eugene CDS 17336 17940 . + 2 ID=CDS:Solyc00g005000.2.1.2;Parent=mRNA:Solyc00g005000.2.1;from_BOGAS=1
SL2.40ch00 ITAG_eugene three_prime_UTR 17941 18189 . + . ID=three_prime_UTR:Solyc00g005000.2.1.0;Parent=mRNA:Solyc00g005000.2.1;from_BOGAS=1

**Simon Anders** · 09-08-2011, 08:48 AM

Try '-i ID', not '-i ID=gene'. Only the part before the '=' is the attribute name.

**sofia17** · 09-09-2011, 06:40 AM

Thank you Simon for “holding hands”

. It worked with “–i ID”.

From your experience, is it normal that out of about 15 million Illumina reads 1.5 million go in the no_feature pile?

**Simon Anders** · 09-09-2011, 09:25 AM

Originally posted by sofia17 View Post

From your experience, is it normal that out of about 15 million Illumina reads 1.5 million go in the no_feature pile?

Yes, that does not sound unusual.

**sofia17** · 09-13-2011, 09:49 AM

Dear Simon,

I am back in business asking questions.

I ran at the R console

source("http://www.bioconductor.org/biocLite.R")
biocLite("DESeq")

to install DESeq but when I try to load the library with

library(DESeq)

it gives the following err:

Error in loadNamespace(i[[1L]], c(lib.loc, .libPaths())) :
there is no package called 'annotate'
Error: package/namespace load failed for 'DESeq'

I am running R 2.13.1 (latest version) on Mac OS 10.6.4. Any suggestions?

**sofia17** · 09-13-2011, 12:28 PM

Oooops. Is working now. Maybe the bioconductor site was down earlier....

**Dedeusan** · 12-09-2011, 06:30 AM

Problem with htseq-count

Dear SImon,
I am just starting using HTSeq and I found the same problem as Sofia, but anything of the solutions that you recommended works this time. CAn you help me? This is the command line I use:
sandra@sandra-VirtualBox:~/HTSeq-0.5.3p3$ htseq-count -s no s_1_sequence_clipped_CDS_tophat.bam TbruceiTreu927_TriTrypDB-3.32.gff

And what I get:
Error occured in line 263 of file TbruceiTreu927_TriTrypDB-3.3.2.gff.
Error: Feature gene|exon_TB927.5.300b-1 does not contain a 'gene_id' attribute
[Exception type: SystemExit, raised in count.py:55]

I'd tried to use -i ID but I only obtain this:

sandra@sandra-VirtualBox:~/HTSeq-0.5.3p3$ htseq-count -s no -i ID s_1_sequence_clipped_CDS_tophat.bam TbruceiTreu927_TriTrypDB-3.32.gff
Error occured in line 29875 of file TbruceiTreu927_TriTrypDB-3.32.gff.
Error: Failure parsing GFF attribute line
[Exception type: ValueError, raised in __init__.py:171]

I am working wih trypanosoma brucei from what I only have the gff data (there is no gtf data available) and when I look for the line this is the information I have in the gff starting the line 263 adn some more lines as example:

pidb|BAC26D11 ApiDB exon 53516 54478 . - . ID=apidb|exon_TB927.5.300b-1;Name=exon;description=exon;size=963;Parent=apidb|rna_TB927.5.300b-1
apidb|Tb927_04_v4 ApiDB gene 1458621 1459109 . - . ID=apidb|Tb04.24M18.150;Name=Tb04.24M18.150;description=hypothetical+protein%2C+conserved;size=489;web_id=Tb04.24M18.150;locus_tag=Tb04.24M18.150;size=489;Alias=20660476,Tb04.24M18.150
apidb|Tb927_04_v4 ApiDB mRNA 1458621 1459109 . - . ID=apidb|rna_Tb04.24M18.150-1;Name=Tb04.24M18.150-1;description=Tb04.24M18.150-1;size=489;Parent=apidb|Tb04.24M18.150;Ontology_term=GO:0044429;Dbxref=ApiDB:Tb04.24M18.150,taxon:185431
apidb|Tb927_04_v4 ApiDB CDS 1458621 1459109 . - 0 ID=apidb|cds_Tb04.24M18.150-1;Name=cds;description=.;size=489;Parent=apidb|rna_Tb04.24M18.150-1
apidb|Tb927_04_v4 ApiDB exon 1458621 1459109 . - . ID=apidb|exon_Tb04.24M18.150-1;Name=exon;description=exon;size=489;Parent=apidb|rna_Tb04.24M18.150-1
apidb|Tb927_04_v4 ApiDB gene 1312951 1313406 . - . ID=apidb|Tb04.3I12.100;Name=Tb04.3I12.100;description=hypothetical+protein;size=456;web_id=Tb04.3I12.100;locus_tag=Tb04.3I12.100;size=456;Alias=Tb04.3I12.100
apidb|Tb927_04_v4 ApiDB mRNA 1312951 1313406 . - . ID=apidb|rna_Tb04.3I12.100-1;Name=Tb04.3I12.100-1;description=Tb04.3I12.100-1;size=456;Parent=apidb|Tb04.3I12.100;Dbxref=ApiDB:Tb04.3I12.100,taxon:185431
apidb|Tb927_04_v4 ApiDB CDS 1312951 1313406 . - 0 ID=apidb|cds_Tb04.3I12.100-1;Name=cds;description=.;size=456;Parent=apidb|rna_Tb04.3I12.100-1
apidb|Tb927_04_v4 ApiDB exon 1312951 1313406 . - . ID=apidb|exon_Tb04.3I12.100-1;Name=exon;description=exon;size=456;Parent=apidb|rna_Tb04.3I12.100-1
apidb|Tb927_05_v4 ApiDB gene 345728 346237 . + . ID=apidb|Tb05.28F8.200;Name=Tb05.28F8.200;description=hypothetical+protein;size=510;web_id=Tb05.28F8.200;locus_tag=Tb05.28F8.200;size=510;Alias=Tb05.28F8.200
apidb|Tb927_05_v4 ApiDB mRNA 345728 346237 . + . ID=apidb|rna_Tb05.28F8.200-1;Name=Tb05.28F8.200-1;description=Tb05.28F8.200-1;size=510;Parent=apidb|Tb05.28F8.200;Dbxref=ApiDB:Tb05.28F8.200,taxon:185431
apidb|Tb927_05_v4 ApiDB CDS 345728 346237 . + 0 ID=apidb|cds_Tb05.28F8.200-1;Name=cds;description=.;size=510;Parent=apidb|rna_Tb05.28F8.200-1
apidb|Tb927_05_v4 ApiDB exon 345728 346237 . + . ID=apidb|exon_Tb05.28F8.200-1;Name=exon;description=exon;size=510;Parent=apidb|rna_Tb05.28F8.200-1
apidb|Tb927_05_v4 ApiDB gene 235483 235992 . - .

What may I change to make htseq-count work?

Thank you very much in advance!
Sandra

**Simon Anders** · 12-10-2011, 01:24 PM

Using 'i ID' did solve your first problem, because afterward, the script no longer complained about line 263 of the GFF file but about line 29875. So, have a look there, too.

Your next problem will be that htseq-count, at the moment, does not read bam file. You will need to do something like

Code:

samtools view myalignment.bam | htseq-count [options] - myannotation.gff

The lone minus sign tells htseq-count to read the sam file from standard input, i.e., through the pipe from 'samtools view'.

Topics	Statistics	Last Post
Whole-Genome Sequencing Traces Faroe Islands Ancestry to a North Atlantic Founder Population by SEQadmin2 Started by SEQadmin2, 06-17-2026, 06:09 AM	0 responses 38 views 0 reactions	Last Post by SEQadmin2 06-17-2026, 06:09 AM
Sequencing the Two-Toed Sloth Genome Reveals Jumping Genes Tied to Its Extreme Metabolism by SEQadmin2 Started by SEQadmin2, 06-09-2026, 11:58 AM	0 responses 101 views 0 reactions	Last Post by SEQadmin2 06-09-2026, 11:58 AM
A New Method Makes Hantavirus Genome Analysis Faster and More Accessible by SEQadmin2 Started by SEQadmin2, 06-05-2026, 10:09 AM	0 responses 123 views 0 reactions	Last Post by SEQadmin2 06-05-2026, 10:09 AM
A New Single-Cell Method Maps DNA-Protein Interactions by SEQadmin2 Started by SEQadmin2, 06-04-2026, 08:59 AM	0 responses 114 views 0 reactions	Last Post by SEQadmin2 06-04-2026, 08:59 AM

Unconfigured Ad

htseq-count for sam and gff3

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News