SEQanswers

Go Back   SEQanswers > Applications Forums > RNA Sequencing



Similar Threads
Thread Thread Starter Forum Replies Last Post
DEXSeq Using Counts File From htseq-count FuzzyCoder Bioinformatics 20 01-03-2016 11:18 PM
Error with GTF file when using htseq-count MDonlin Bioinformatics 13 01-13-2015 08:29 AM
Strange error when using htseq-count shhuang Bioinformatics 13 11-19-2012 12:40 AM
htseq count missing last mapped fragment mapardo RNA Sequencing 2 11-08-2011 05:27 AM
Htseq-count Vs CountOverlap function in IRanges biofreak General 5 06-29-2011 10:36 AM

Reply
 
Thread Tools
Old 09-02-2011, 12:10 PM   #1
sofia17
Junior Member
 
Location: Ohio

Join Date: Sep 2011
Posts: 7
Default htseq-count for sam and gff3

Hi everyone,

I need some help with running htseq-count.

I am running htseq-count for a sam file obtained as a result of Illumina reads (not paired) aligned to the genome. I am getting the following Warning for all my reads:

"Warning: Skipping read 'S5_057841196', because chromosome 'SL2.40ch00', to which it has been aligned, did not appear in the GFF file."

and of course, all of my reads go in the "alignment_not_unique" pile, so there is no gene count at the end.

I read a similar problem posted by hibachings2013 in 2010, but unlike his problem, my sam and gff3 files BOTH have the correct chromosome names.

What am I doing wrong?

Below are examples of entries in my sam and gff3 files:

@HD VN:1.0 SO:coordinate
@SQ SN:SL2.40ch00 LN:21805821
@SQ SN:SL2.40ch01 LN:90304244
@SQ SN:SL2.40ch02 LN:49918294
@SQ SN:SL2.40ch03 LN:64840714
@SQ SN:SL2.40ch04 LN:64064312
@SQ SN:SL2.40ch05 LN:65021438
@SQ SN:SL2.40ch06 LN:46041636
@SQ SN:SL2.40ch07 LN:65268621
@SQ SN:SL2.40ch08 LN:63032657
@SQ SN:SL2.40ch09 LN:67662091
@SQ SN:SL2.40ch10 LN:64834305
@SQ SN:SL2.40ch11 LN:53386025
@SQ SN:SL2.40ch12 LN:65486253
@PG ID:TopHat VN:1.3.1 CL:/usr/local/bin/tophat -I 5000 --segment-mismatches 1 -o ./Output/tophatSA1_S6/ -G ./Annotation/ITAG2.3_gene_models.gff3 ./Reference/S_lycopersicm_genome ./Input/SA1_S6
S6_021828409 272 SL2.40ch00 96314 0 49M * 0 0 GACTCTTGAATACAATCTTACAATTTTTCCTCACAAATTGCTACACCCA IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII NM:i:2 NH:i:7 CC:Z:SL2.40ch02 CP:i:13220492 HI:i:0
S6_028644958 256 SL2.40ch00 225372 0 49M * 0 0 CCGACACACTAAATAAAAGAACAATATCACATTGCATATCAAACTAATA IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII NM:i:1 NH:i:11 CC:Z:= CP:i:3434611 HI:i:0
S6_008579902 272 SL2.40ch00 251575 0 49M * 0 0 CTTATGATTTTAATATGAACACATTTATCACTTTCATCATTCTTCGATC IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII NM:i:2 NH:i:10 CC:Z:SL2.40ch01 CP:i:5753251 HI:i:0
S6_033284153 272 SL2.40ch00 251575 0 49M * 0 0 CTTATGATTTTAATATGAACACATTTATCACTTTCATCATTCTTCGATC IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII NM:i:2 NH:i:10 CC:Z:SL2.40ch01 CP:i:5753251 HI:i:0
S6_084652203 272 SL2.40ch00 251582 0 49M * 0 0 TTTTAATATGAACACATTTATCACTTTCATCATTCTTCGATCCATTTGC IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII NM:i:2 NH:i:8 CC:Z:SL2.40ch01 CP:i:5753258 HI:i:0
S6_015020076 272 SL2.40ch00 251583 0 49M * 0 0 TTTAATATGAACACATTTATCACTTTCATCATTCTTCGATCCATTTGCC IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII NM:i:2 NH:i:9 CC:Z:SL2.40ch01 CP:i:5753259 HI:i:0

##gff-version 3
##feature-ontology http://song.cvs.sourceforge.net/*che...?revision=1.93
##sequence-region SL2.40ch00 1 21805821
SL2.40ch00 ITAG_eugene gene 16437 18189 . + . Alias=Solyc00g005000;ID=gene:Solyc00g005000.2;Name=Solyc00g005000.2;from_BOGAS=1;length=1753
SL2.40ch00 ITAG_eugene mRNA 16437 18189 . + . ID=mRNA:Solyc00g005000.2.1;Name=Solyc00g005000.2.1;Note=Aspartic proteinase nepenthesin I (AHRD V1 **-- A9ZMF9_NEPAL)%3B contains Interpro domain(s) IPR001461 Peptidase A1 ;Ontology_term=GO:0006508;Parent=gene:Solyc00g005000.2;from_BOGAS=1;interpro2go_term=GO:0006508;length=1753;nb_exon=2
SL2.40ch00 ITAG_eugene exon 16437 17275 . + . ID=exon:Solyc00g005000.2.1.1;Parent=mRNA:Solyc00g005000.2.1;from_BOGAS=1
SL2.40ch00 ITAG_eugene five_prime_UTR 16437 16479 . + . ID=five_prime_UTR:Solyc00g005000.2.1.0;Parent=mRNA:Solyc00g005000.2.1;from_BOGAS=1
SL2.40ch00 ITAG_eugene CDS 16480 17275 . + 0 ID=CDS:Solyc00g005000.2.1.1;Parent=mRNA:Solyc00g005000.2.1;from_BOGAS=1
SL2.40ch00 ITAG_eugene intron 17276 17335 . + . ID=intron:Solyc00g005000.2.1.1;Parent=mRNA:Solyc00g005000.2.1;from_BOGAS=1
SL2.40ch00 ITAG_eugene exon 17336 18189 . + 0 ID=exon:Solyc00g005000.2.1.2;Parent=mRNA:Solyc00g005000.2.1;from_BOGAS=1
SL2.40ch00 ITAG_eugene CDS 17336 17940 . + 2 ID=CDS:Solyc00g005000.2.1.2;Parent=mRNA:Solyc00g005000.2.1;from_BOGAS=1
SL2.40ch00 ITAG_eugene three_prime_UTR 17941 18189 . + . ID=three_prime_UTR:Solyc00g005000.2.1.0;Parent=mRNA:Solyc00g005000.2.1;from_BOGAS=1
###
SL2.40ch00 ITAG_eugene gene 68062 68764 . + . Alias=Solyc00g005020;ID=gene:Solyc00g005020.1;Name=Solyc00g005020.1;from_BOGAS=1;length=703
SL2.40ch00 ITAG_eugene mRNA 68062 68764 . + . ID=mRNA:Solyc00g005020.1.1;Name=Solyc00g005020.1.1;Note=Unknown Protein (AHRD V1);Parent=gene:Solyc00g005020.1;from_BOGAS=1;length=703;nb_exon=3;eugene_evidence_code=10F0H0E0IEG
SL2.40ch00 ITAG_eugene exon 68062 68211 . + 0 ID=exon:Solyc00g005020.1.1.1;Parent=mRNA:Solyc00g005020.1.1;from_BOGAS=1
SL2.40ch00 ITAG_eugene CDS 68062 68211 . + 0 ID=CDS:Solyc00g005020.1.1.1;Parent=mRNA:Solyc00g005020.1.1;from_BOGAS=1
SL2.40ch00 ITAG_eugene intron 68212 68343 . + . ID=intron:Solyc00g005020.1.1.1;Parent=mRNA:Solyc00g005020.1.1;from_BOGAS=1
SL2.40ch00 ITAG_eugene exon 68344 68568 . + 0 ID=exon:Solyc00g005020.1.1.2;Parent=mRNA:Solyc00g005020.1.1;from_BOGAS=1
SL2.40ch00 ITAG_eugene CDS 68344 68568 . + 0 ID=CDS:Solyc00g005020.1.1.2;Parent=mRNA:Solyc00g005020.1.1;from_BOGAS=1
SL2.40ch00 ITAG_eugene intron 68569 68653 . + . ID=intron:Solyc00g005020.1.1.2;Parent=mRNA:Solyc00g005020.1.1;from_BOGAS=1
SL2.40ch00 ITAG_eugene exon 68654 68764 . + 0 ID=exon:Solyc00g005020.1.1.3;Parent=mRNA:Solyc00g005020.1.1;from_BOGAS=1
SL2.40ch00 ITAG_eugene CDS 68654 68764 . + 0 ID=CDS:Solyc00g005020.1.1.3;Parent=mRNA:Solyc00g005020.1.1;from_BOGAS=1

Last edited by sofia17; 09-02-2011 at 12:12 PM.
sofia17 is offline   Reply With Quote
Old 09-06-2011, 06:45 AM   #2
Simon Anders
Senior Member
 
Location: Heidelberg, Germany

Join Date: Feb 2010
Posts: 994
Default

It might be because this is not a GTF file; it does not have 'gene_ID' attribute -- or rather, it has them, but they are named 'Parent'. Hence, you need the option '-i Parent'.
Simon Anders is offline   Reply With Quote
Old 09-06-2011, 06:47 AM   #3
Simon Anders
Senior Member
 
Location: Heidelberg, Germany

Join Date: Feb 2010
Posts: 994
Default

Sorry, that was wrong. The GFF file contains a transcript ID in the 'Parent' attribute of 'exon' lines, but we would need the gene ID. You will need to fix this manually.
Simon Anders is offline   Reply With Quote
Old 09-06-2011, 07:21 AM   #4
sofia17
Junior Member
 
Location: Ohio

Join Date: Sep 2011
Posts: 7
Default

so in the gff3 file I should replace entries such as

"Parent=mRNA:Solyc00g005000.2.1"

with

"ID=gene:Solyc00g005000.2.1"???
sofia17 is offline   Reply With Quote
Old 09-06-2011, 07:28 AM   #5
Simon Anders
Senior Member
 
Location: Heidelberg, Germany

Join Date: Feb 2010
Posts: 994
Default

Nearly. It should be "ID=gene:Solyc00g005000.2". Compare carefully the "mRNA" and the "gene" lines.
Simon Anders is offline   Reply With Quote
Old 09-06-2011, 08:12 AM   #6
sofia17
Junior Member
 
Location: Ohio

Join Date: Sep 2011
Posts: 7
Default

Within a gene should I do the above replacement for ALL components, i.e. mRNA, exon, intron, CDS, three_prime_UTR, etc...? Or for mRNA lines ONLY?

Last edited by sofia17; 09-06-2011 at 08:29 AM.
sofia17 is offline   Reply With Quote
Old 09-06-2011, 10:23 PM   #7
Simon Anders
Senior Member
 
Location: Heidelberg, Germany

Join Date: Feb 2010
Posts: 994
Default

Only for the "exon" lines, because htseq-count ignores the others.
Simon Anders is offline   Reply With Quote
Old 09-08-2011, 07:32 AM   #8
sofia17
Junior Member
 
Location: Ohio

Join Date: Sep 2011
Posts: 7
Default

I have done the replacement (please see below that all exons have the correct 'ID=gene:...' field). But it still gives same error:
Error occured in line 6 of file /Users/vanderlab1/Desktop/try/processed.gff.
Error: Feature exon:Solyc00g005000.2.1.1 does not contain a 'gene_id' attribute
Exception type: SystemExit, raised in count.py:55]

I tried also: htseq-count -i ID=gene <samFile> <gff3File>
and got the same error

Error occured in line 6 of file /Users/vanderlab1/Desktop/try/processed.gff.
Error: Feature exon:Solyc00g005000.2.1.1 does not contain a 'ID=gene' attribute
Exception type: SystemExit, raised in count.py:55]

Maybe I use wrongly the -i option?

##gff-version 3
##feature-ontology http://song.cvs.sourceforge.net/*che...?revision=1.93
##sequence-region SL2.40ch00 1 21805821
SL2.40ch00 ITAG_eugene gene 16437 18189 . + . Alias=Solyc00g005000;ID=gene:Solyc00g005000.2;Name=Solyc00g005000.2;from_BOGAS=1;length=1753
SL2.40ch00 ITAG_eugene mRNA 16437 18189 . + . ID=mRNA:Solyc00g005000.2.1;Name=Solyc00g005000.2.1;Note=Aspartic proteinase nepenthesin I (AHRD V1 **-- A9ZMF9_NEPAL)%3B contains Interpro domain(s) IPR001461 Peptidase A1 ;Ontology_term=GO:0006508;Parent=gene:Solyc00g005000.2;from_BOGAS=1;interpro2go_term=GO:0006508;length=1753;nb_exon=2
SL2.40ch00 ITAG_eugene exon 16437 17275 . + . ID=exon:Solyc00g005000.2.1.1;ID=gene:Solyc00g005000.2;from_BOGAS=1
SL2.40ch00 ITAG_eugene five_prime_UTR 16437 16479 . + . ID=five_prime_UTR:Solyc00g005000.2.1.0;Parent=mRNA:Solyc00g005000.2.1;from_BOGAS=1
SL2.40ch00 ITAG_eugene CDS 16480 17275 . + 0 ID=CDS:Solyc00g005000.2.1.1;Parent=mRNA:Solyc00g005000.2.1;from_BOGAS=1
SL2.40ch00 ITAG_eugene intron 17276 17335 . + . ID=intron:Solyc00g005000.2.1.1;Parent=mRNA:Solyc00g005000.2.1;from_BOGAS=1
SL2.40ch00 ITAG_eugene exon 17336 18189 . + 0 ID=exon:Solyc00g005000.2.1.2;ID=gene:Solyc00g005000.2;from_BOGAS=1
SL2.40ch00 ITAG_eugene CDS 17336 17940 . + 2 ID=CDS:Solyc00g005000.2.1.2;Parent=mRNA:Solyc00g005000.2.1;from_BOGAS=1
SL2.40ch00 ITAG_eugene three_prime_UTR 17941 18189 . + . ID=three_prime_UTR:Solyc00g005000.2.1.0;Parent=mRNA:Solyc00g005000.2.1;from_BOGAS=1

Last edited by sofia17; 09-08-2011 at 07:34 AM.
sofia17 is offline   Reply With Quote
Old 09-08-2011, 08:48 AM   #9
Simon Anders
Senior Member
 
Location: Heidelberg, Germany

Join Date: Feb 2010
Posts: 994
Default

Try '-i ID', not '-i ID=gene'. Only the part before the '=' is the attribute name.
Simon Anders is offline   Reply With Quote
Old 09-09-2011, 06:40 AM   #10
sofia17
Junior Member
 
Location: Ohio

Join Date: Sep 2011
Posts: 7
Default

Thank you Simon for “holding hands” . It worked with “–i ID”.

From your experience, is it normal that out of about 15 million Illumina reads 1.5 million go in the no_feature pile?
sofia17 is offline   Reply With Quote
Old 09-09-2011, 09:25 AM   #11
Simon Anders
Senior Member
 
Location: Heidelberg, Germany

Join Date: Feb 2010
Posts: 994
Default

Quote:
Originally Posted by sofia17 View Post
From your experience, is it normal that out of about 15 million Illumina reads 1.5 million go in the no_feature pile?
Yes, that does not sound unusual.
Simon Anders is offline   Reply With Quote
Old 09-13-2011, 09:49 AM   #12
sofia17
Junior Member
 
Location: Ohio

Join Date: Sep 2011
Posts: 7
Default

Dear Simon,

I am back in business asking questions.

I ran at the R console

source("http://www.bioconductor.org/biocLite.R")
biocLite("DESeq")

to install DESeq but when I try to load the library with

library(DESeq)

it gives the following err:

Error in loadNamespace(i[[1L]], c(lib.loc, .libPaths())) :
there is no package called 'annotate'
Error: package/namespace load failed for 'DESeq'

I am running R 2.13.1 (latest version) on Mac OS 10.6.4. Any suggestions?
sofia17 is offline   Reply With Quote
Old 09-13-2011, 12:28 PM   #13
sofia17
Junior Member
 
Location: Ohio

Join Date: Sep 2011
Posts: 7
Default

Oooops. Is working now. Maybe the bioconductor site was down earlier....
sofia17 is offline   Reply With Quote
Old 12-09-2011, 05:30 AM   #14
Dedeusan
Junior Member
 
Location: spain

Join Date: Nov 2011
Posts: 7
Default Problem with htseq-count

Dear SImon,
I am just starting using HTSeq and I found the same problem as Sofia, but anything of the solutions that you recommended works this time. CAn you help me? This is the command line I use:
sandra@sandra-VirtualBox:~/HTSeq-0.5.3p3$ htseq-count -s no s_1_sequence_clipped_CDS_tophat.bam TbruceiTreu927_TriTrypDB-3.32.gff

And what I get:
Error occured in line 263 of file TbruceiTreu927_TriTrypDB-3.3.2.gff.
Error: Feature gene|exon_TB927.5.300b-1 does not contain a 'gene_id' attribute
[Exception type: SystemExit, raised in count.py:55]

I'd tried to use -i ID but I only obtain this:

sandra@sandra-VirtualBox:~/HTSeq-0.5.3p3$ htseq-count -s no -i ID s_1_sequence_clipped_CDS_tophat.bam TbruceiTreu927_TriTrypDB-3.32.gff
Error occured in line 29875 of file TbruceiTreu927_TriTrypDB-3.32.gff.
Error: Failure parsing GFF attribute line
[Exception type: ValueError, raised in __init__.py:171]

I am working wih trypanosoma brucei from what I only have the gff data (there is no gtf data available) and when I look for the line this is the information I have in the gff starting the line 263 adn some more lines as example:


pidb|BAC26D11 ApiDB exon 53516 54478 . - . ID=apidb|exon_TB927.5.300b-1;Name=exon;description=exon;size=963;Parent=apidb|rna_TB927.5.300b-1
apidb|Tb927_04_v4 ApiDB gene 1458621 1459109 . - . ID=apidb|Tb04.24M18.150;Name=Tb04.24M18.150;description=hypothetical+protein%2C+conserved;size=489;web_id=Tb04.24M18.150;locus_tag=Tb04.24M18.150;size=489;Alias=20660476,Tb04.24M18.150
apidb|Tb927_04_v4 ApiDB mRNA 1458621 1459109 . - . ID=apidb|rna_Tb04.24M18.150-1;Name=Tb04.24M18.150-1;description=Tb04.24M18.150-1;size=489;Parent=apidb|Tb04.24M18.150;Ontology_term=GO:0044429;Dbxref=ApiDB:Tb04.24M18.150,taxon:185431
apidb|Tb927_04_v4 ApiDB CDS 1458621 1459109 . - 0 ID=apidb|cds_Tb04.24M18.150-1;Name=cds;description=.;size=489;Parent=apidb|rna_Tb04.24M18.150-1
apidb|Tb927_04_v4 ApiDB exon 1458621 1459109 . - . ID=apidb|exon_Tb04.24M18.150-1;Name=exon;description=exon;size=489;Parent=apidb|rna_Tb04.24M18.150-1
apidb|Tb927_04_v4 ApiDB gene 1312951 1313406 . - . ID=apidb|Tb04.3I12.100;Name=Tb04.3I12.100;description=hypothetical+protein;size=456;web_id=Tb04.3I12.100;locus_tag=Tb04.3I12.100;size=456;Alias=Tb04.3I12.100
apidb|Tb927_04_v4 ApiDB mRNA 1312951 1313406 . - . ID=apidb|rna_Tb04.3I12.100-1;Name=Tb04.3I12.100-1;description=Tb04.3I12.100-1;size=456;Parent=apidb|Tb04.3I12.100;Dbxref=ApiDB:Tb04.3I12.100,taxon:185431
apidb|Tb927_04_v4 ApiDB CDS 1312951 1313406 . - 0 ID=apidb|cds_Tb04.3I12.100-1;Name=cds;description=.;size=456;Parent=apidb|rna_Tb04.3I12.100-1
apidb|Tb927_04_v4 ApiDB exon 1312951 1313406 . - . ID=apidb|exon_Tb04.3I12.100-1;Name=exon;description=exon;size=456;Parent=apidb|rna_Tb04.3I12.100-1
apidb|Tb927_05_v4 ApiDB gene 345728 346237 . + . ID=apidb|Tb05.28F8.200;Name=Tb05.28F8.200;description=hypothetical+protein;size=510;web_id=Tb05.28F8.200;locus_tag=Tb05.28F8.200;size=510;Alias=Tb05.28F8.200
apidb|Tb927_05_v4 ApiDB mRNA 345728 346237 . + . ID=apidb|rna_Tb05.28F8.200-1;Name=Tb05.28F8.200-1;description=Tb05.28F8.200-1;size=510;Parent=apidb|Tb05.28F8.200;Dbxref=ApiDB:Tb05.28F8.200,taxon:185431
apidb|Tb927_05_v4 ApiDB CDS 345728 346237 . + 0 ID=apidb|cds_Tb05.28F8.200-1;Name=cds;description=.;size=510;Parent=apidb|rna_Tb05.28F8.200-1
apidb|Tb927_05_v4 ApiDB exon 345728 346237 . + . ID=apidb|exon_Tb05.28F8.200-1;Name=exon;description=exon;size=510;Parent=apidb|rna_Tb05.28F8.200-1
apidb|Tb927_05_v4 ApiDB gene 235483 235992 . - .


What may I change to make htseq-count work?

Thank you very much in advance!
Sandra
Dedeusan is offline   Reply With Quote
Old 12-10-2011, 12:24 PM   #15
Simon Anders
Senior Member
 
Location: Heidelberg, Germany

Join Date: Feb 2010
Posts: 994
Default

Using 'i ID' did solve your first problem, because afterward, the script no longer complained about line 263 of the GFF file but about line 29875. So, have a look there, too.

Your next problem will be that htseq-count, at the moment, does not read bam file. You will need to do something like
Code:
samtools view myalignment.bam | htseq-count [options] - myannotation.gff
The lone minus sign tells htseq-count to read the sam file from standard input, i.e., through the pipe from 'samtools view'.
Simon Anders is offline   Reply With Quote
Old 12-12-2011, 01:01 AM   #16
Dedeusan
Junior Member
 
Location: spain

Join Date: Nov 2011
Posts: 7
Default Almost...

Hi Simon.
First of all, thank you very much!!!
Second:
Apparently, it worked. It was an ; extra in a pair of lines so it continues reading everything just until it arrives to the #fasta part:

htseq-count -s no -i ID s_1_sequence_clipped_tophat.bam TbruceiTreu927_TriTrypDB-3.3.3.gff
Error occured in line 46748 of file TbruceiTreu927_TriTrypDB-3.3.3.gff.
Error: need more than 1 value to unpack
[Exception type: ValueError, raised in __init__.py:214]

And when I look at this line this is what I can see:


tryp_XI-1036e06.p1k.snoRNA.0004;Dbxref=ApiDB:tryp_XI-1036e06.p1k.snoRNA.0004,taxon:185431
apidb|tryp_XI-1036e06.p1k ApiDB exon 22123 22200 . + . ID=apidb|exon_tryp_XI-1036e06.p1k.snoRNA.0004-1;Name=exon;description=exon;size=78;Parent=apidb|rna_tryp_XI-1036e06.p1k.snoRNA.0004-1
(This is line 46748)##FASTA
>apidb|TB927.5.300b
ATGGCTCACGGCTCGATTCCAGTTATTGATGTCGGCCCTCTGTTCTGTGATGGAGAAAAG
GGGATGATGGATGTTGCGAAACAGATTGATCATGCCTGTAGGACGTGGGGTGTTTTTCTT
GTTGTGGGTCATCCCATTCCCCGTGAGCGAACGGAAAAGTTGATGGAAATGGCCAAGGCT
TTTTTTTCGCTTCCATTGGAAGAGAAACTTAAGGTTGATATTCGAAAGAGCAAACATCAT
CGCGGTTACGGATGCCTCGATGCGGAGAATGTTGACCCAACGAAACCATTTGATTGTAAA
GAAACATTTAATATGGGCTGTCATCTCCCTGAGGATCACCCCGATGTTGCAGCTGGAAAG
CCATTGCGTGGACCGAACAATCACCCCACGCAAGTGAAAGGTTGGGTAGAGTTGATGAAC
AGACATTATCGCGAAATGCAGGAATTTGCCCTCGTTATTCTTCGTGCCCTCGCACTCGCT
ATTGGTTTAAAGAAAGACTTTTTCGATACCAAATTTGATGAACCTTTGAGTGTGTTCCGT
ATGCTACATTATCCTCCACAAAAGCAAGGGACCCGTTATCCCATCGTGTGTGGTGAGCAT
ACGGATTATGGTATTATTACATTACTCTACCAAGATTCGGTGGGAGGACTGCAGGTGCGC
AATCTGTCAGATGAGTGGGTGGATGTGGAACCCCTTGAAGGAAGTTTTGTTGTGAATATT
GGGGACATGATGAATATGTGGAGTAATGGCCGTTACCGCTCAACACCGCATCGCGTTCGC
TTAACCACAACTGATCGCTACTCCATGCCATTTTTCTGTCAGCCTAATCCTTATACTGTT
ATTAAATGCCTTGATCATTGCCATTCGCCAAGCAATCCCCCCAAATATCCACCAGTCCGT
GCTGTGGATTGGTTGCTGAAGCGTTTCGCGGAAACATATGCCCATCGCAAAACAAAGATG
TGA
>apidb|cds_TB927.5.300b-1
MAHGSIPVIDVGPLFCDGEKGMMDVAKQIDHACRTWGVFLVVGHPIPRERTEKLMEMAKA
FFSLPLEEKLKVDIRKSKHHRGYGCLDAENVDPTKPFDCKETFNMGCHLPEDHPDVAAGK
PLRGPNNHPTQVKGWVELMNRHYREMQEFALVILRALALAIGLKKDFFDTKFDEPLSVFR
MLHYPPQKQGTRYPIVCGEHTDYGIITLLYQDSVGGLQVRNLSDEWVDVEPLEGSFVVNI
GDMMNMWSNGRYRSTPHRVRLTTTDRYSMPFFCQPNPYTVIKCLDHCHSPSNPPKYPPVR
AVDWLLKRFAETYAHRKTKM

Why does htseq-count gives now an error?
Again, thanks for all!
Sandra
Dedeusan is offline   Reply With Quote
Old 12-12-2011, 01:33 AM   #17
Simon Anders
Senior Member
 
Location: Heidelberg, Germany

Join Date: Feb 2010
Posts: 994
Default

In my understanding of the GFF standard, a GFF file is supposed to contain GFF lines and not FASTA lines. I've seen this before that a full FASTA file is concatenated to the end of the GFF file but this is really confusing for any software trying to parse the file. Please remove everything that is not GFF from your GFF file.
Simon Anders is offline   Reply With Quote
Old 12-12-2011, 02:09 AM   #18
Dedeusan
Junior Member
 
Location: spain

Join Date: Nov 2011
Posts: 7
Default

Hi SImon,

I am really desiring now to be working with Drosophila or something similar...
I applied what you told me:

samtools view s_1_sequence_clipped_tophat.bam | htseq-count -s no -i ID - TbruceiTreu927_TriTrypDB-3.3.4.gff > count.txt


Warning: Skipping read 'HWUSI-EAS582_211:1:106:1372:1342', because chromosome 'GeneDB|Tb927_01_v4', to which it has been aligned, did not appear in the GFF file

And after I stopped the execution of the program:

^CError: 'itertools.chain' object has no attribute 'get_line_number_string'
[Exception type: AttributeError, raised in count.py:198]

Maybe it was not a good idea to eliminate the fasta part...
Dedeusan is offline   Reply With Quote
Old 12-12-2011, 02:46 AM   #19
Simon Anders
Senior Member
 
Location: Heidelberg, Germany

Join Date: Feb 2010
Posts: 994
Default

Well, it is a only warning that told you that a single read got skipped. I suppose you can ignore that. If you have contigs in your reference to which the GFF file does not assign any exons it is completely expected to get a few of these warnings.
Simon Anders is offline   Reply With Quote
Old 12-12-2011, 03:33 AM   #20
Dedeusan
Junior Member
 
Location: spain

Join Date: Nov 2011
Posts: 7
Default

Well, SImon, this is what I get:


apidb|exon_Tb09.160.0220-1 0

And over 12.000 similar, and:

no_feature 0
ambiguous 0
too_low_aQual 0
not_aligned 0
alignment_not_unique 11678480

But at the txt document I couldn't see anything else about the 11678480 aligments. Did it worked?? Can I see the aligments somewhere?

And Soooo sorry for disturbing you so much. THanks for your attention!
Sandra
Dedeusan is offline   Reply With Quote
Reply

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 10:57 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2020, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO