Hi all,
I have been using HTSeq and DESeq for my RNA Seq pipeline at UCLA, and everything is great except for one thing: I am having trouble with counting reads with HTSeq for Arabidopsis Thaliana. I have so far used two different GTFs, and they both return the same error:
I've looked through the gtf, and I haven't been able to find any mismatched quotes or anything out of the ordinary, except for the attribute line, which has slightly different fields (does not have gene_biotype, has p_id and tss_id). Here's a sample of the gtf file I'm using:
Has anyone run into the same problem? I've used two different GTFs based on someone else's suggestion, one from Ensembl plants and another from iGenomes, and they both have the same error.
Any input or suggestions would be greatly appreciated. I've posted this on a different thread, but it was a bit of a different issue, so I've started a new one.
I have been using HTSeq and DESeq for my RNA Seq pipeline at UCLA, and everything is great except for one thing: I am having trouble with counting reads with HTSeq for Arabidopsis Thaliana. I have so far used two different GTFs, and they both return the same error:
Traceback (most recent call last):
File "python_scripts/sam_to_gene_array_2.py", line 80, in <module>
main()
File "python_scripts/sam_to_gene_array_2.py", line 41, in main
for feature in gtf:
File "/u/home/mcdb/arturj/.local/lib/python2.6/site-packages/HTSeq-0.5.3p3-py2.6-linux-x86_64.egg/HTSeq/__init__.py", line 215, in __iter__
( attr, name ) = parse_GFF_attribute_string( attributeStr, True )
File "/u/home/mcdb/arturj/.local/lib/python2.6/site-packages/HTSeq-0.5.3p3-py2.6-linux-x86_64.egg/HTSeq/__init__.py", line 168, in parse_GFF_attribute_string
raise ValueError, "The attribute string seems to contain mismatched quotes."
ValueError: The attribute string seems to contain mismatched quotes.
File "python_scripts/sam_to_gene_array_2.py", line 80, in <module>
main()
File "python_scripts/sam_to_gene_array_2.py", line 41, in main
for feature in gtf:
File "/u/home/mcdb/arturj/.local/lib/python2.6/site-packages/HTSeq-0.5.3p3-py2.6-linux-x86_64.egg/HTSeq/__init__.py", line 215, in __iter__
( attr, name ) = parse_GFF_attribute_string( attributeStr, True )
File "/u/home/mcdb/arturj/.local/lib/python2.6/site-packages/HTSeq-0.5.3p3-py2.6-linux-x86_64.egg/HTSeq/__init__.py", line 168, in parse_GFF_attribute_string
raise ValueError, "The attribute string seems to contain mismatched quotes."
ValueError: The attribute string seems to contain mismatched quotes.
1 protein_coding exon 3631 3913 . + . gene_id "AT1G01010"; transcript_id "AT1G01010.1"; exon_number "1"; gene_name "ANAC001"; p_id "P21642"; seqedit "false"; transcript_name "AT1G01010.1"; tss_id "TSS22540";
1 protein_coding CDS 3760 3913 . + 0 gene_id "AT1G01010"; transcript_id "AT1G01010.1"; exon_number "1"; gene_name "ANAC001"; p_id "P21642"; protein_id "AT1G01010.1"; transcript_name "AT1G01010.1"; tss_id "TSS22540";
1 protein_coding start_codon 3760 3762 . + 0 gene_id "AT1G01010"; transcript_id "AT1G01010.1"; exon_number "1"; gene_name "ANAC001"; p_id "P21642"; transcript_name "AT1G01010.1"; tss_id "TSS22540";
1 protein_coding CDS 3996 4276 . + 2 gene_id "AT1G01010"; transcript_id "AT1G01010.1"; exon_number "2"; gene_name "ANAC001"; p_id "P21642"; protein_id "AT1G01010.1"; transcript_name "AT1G01010.1"; tss_id "TSS22540";
1 protein_coding CDS 3760 3913 . + 0 gene_id "AT1G01010"; transcript_id "AT1G01010.1"; exon_number "1"; gene_name "ANAC001"; p_id "P21642"; protein_id "AT1G01010.1"; transcript_name "AT1G01010.1"; tss_id "TSS22540";
1 protein_coding start_codon 3760 3762 . + 0 gene_id "AT1G01010"; transcript_id "AT1G01010.1"; exon_number "1"; gene_name "ANAC001"; p_id "P21642"; transcript_name "AT1G01010.1"; tss_id "TSS22540";
1 protein_coding CDS 3996 4276 . + 2 gene_id "AT1G01010"; transcript_id "AT1G01010.1"; exon_number "2"; gene_name "ANAC001"; p_id "P21642"; protein_id "AT1G01010.1"; transcript_name "AT1G01010.1"; tss_id "TSS22540";
Any input or suggestions would be greatly appreciated. I've posted this on a different thread, but it was a bit of a different issue, so I've started a new one.
Comment