SEQanswers

Go Back   SEQanswers > Bioinformatics > Bioinformatics



Similar Threads
Thread Thread Starter Forum Replies Last Post
"allele balance ratio" and "quality by depth" in VCF files efoss Bioinformatics 2 10-25-2011 12:13 PM
Relatively large proportion of "LOWDATA", "FAIL" of FPKM_status running cufflink ruben6um Bioinformatics 3 10-12-2011 01:39 AM
The position file formats ".clocs" and "_pos.txt"? Ist there any difference? elgor Illumina/Solexa 0 06-27-2011 08:55 AM
"Systems biology and administration" & "Genome generation: no engineering allowed" seb567 Bioinformatics 0 05-25-2010 01:19 PM
SEQanswers second "publication": "How to map billions of short reads onto genomes" ECO Literature Watch 0 06-30-2009 12:49 AM

Reply
 
Thread Tools
Old 04-04-2012, 08:03 AM   #1
kamsen
Junior Member
 
Location: Europe

Join Date: Mar 2012
Posts: 3
Default HTSeq dealing with "*" qualities

Hi everyone,

I started using HTSeq a couple of days ago and now encountered a problem. Maybe someone knows a workaround.

I am interating over an sam file and cant find a solution for the error:
(also described here http://seqanswers.com/forums/showthread.php?t=12091)

Quote:
ValueError: 'seq' and 'qualstr' do not have the same length.
The Alignment is from Bowtie2 and lacks the qualitystring (only a "*" is in the file, but the complete read sequence is there).

Like: blaaaaa ACTACTATCTAC * blaaaaa


Since I have a lot of files I cant perform a filtering in the first place, because I do not want to touch those big files twice.

thanks in advance.

EDIT:
I am using the latest release of HTSeq.



regards

Last edited by kamsen; 04-04-2012 at 08:06 AM.
kamsen is offline   Reply With Quote
Old 04-10-2012, 01:37 PM   #2
maubp
Peter (Biopython etc)
 
Location: Dundee, Scotland, UK

Join Date: Jul 2009
Posts: 1,541
Default

Sounds like a bug in HTSeq - as discussed in the linked thread, the SAM/BAM file format explicitly allows the sequencing qualities to be omitted (which in SAM is represented with the * character).

Have you contacted the HTSeq authors?

P.S. Saying you use the latest version isn't as helpful as saying the actual version you are using. People may read this thread later on
maubp is offline   Reply With Quote
Old 04-10-2012, 11:53 PM   #3
Simon Anders
Senior Member
 
Location: Heidelberg, Germany

Join Date: Feb 2010
Posts: 993
Default

Yes, that's a limitation of HTSeq. Fixing this has been on my to-do list since a while; sorry that it's still not done.
Simon Anders is offline   Reply With Quote
Old 04-11-2012, 02:01 PM   #4
kamsen
Junior Member
 
Location: Europe

Join Date: Mar 2012
Posts: 3
Default

Just a few remarks to close this topic:

1) I was talking about version 0.5.3p3
2) I made quick & dirty workaround in the code (__init__ modul l. 537) which worked for me. If somebody encounters this problem one could easily just return the line from the .sam file and create 0 qualities / read the original ones. After that the conversion to the Alignment format will work again.
3) Thanks anyway for your nice package Simon!

regards
kamsen is offline   Reply With Quote
Old 05-16-2012, 06:19 AM   #5
NicoBxl
not just another member
 
Location: Belgium

Join Date: Aug 2010
Posts: 263
Default

Quote:
Originally Posted by Simon Anders View Post
Yes, that's a limitation of HTSeq. Fixing this has been on my to-do list since a while; sorry that it's still not done.
Hi Simon,

Do you fix this bug ? I've the same problem with tophat 2.0.0 bam files.

Code:
samtools view -h -o out.sam in.bam
htseq-count out.sam annotation.gtf > htseq_out.txt
gives me

Code:
100000 GFF lines processed.
200000 GFF lines processed.
283699 GFF lines processed.
Error occured in line 36 of file out.sam.
Error: ("'seq' and 'qualstr' do not have the same length.", 'line 36 of file out.sam')
[Exception type: ValueError, raised in _HTSeq.pyx:765]
NicoBxl is offline   Reply With Quote
Old 05-22-2012, 08:09 AM   #6
Simon Anders
Senior Member
 
Location: Heidelberg, Germany

Join Date: Feb 2010
Posts: 993
Default

I've just fixed this. In HTSeq 0.5.3p4, SAM files with "*" in the quality field are accepted. Sorry that this took a while.
Simon Anders is offline   Reply With Quote
Old 05-22-2012, 10:35 PM   #7
NicoBxl
not just another member
 
Location: Belgium

Join Date: Aug 2010
Posts: 263
Default

Thanks Simon, it worked great.
NicoBxl is offline   Reply With Quote
Old 05-23-2012, 06:51 AM   #8
fishinabarrel
Junior Member
 
Location: United States

Join Date: Apr 2011
Posts: 6
Default

Dear Simon,

You are my hero.
Just ran into this problem yesterday. And by this morning a solution was already in place.
I owe you a beer.
fishinabarrel is offline   Reply With Quote
Old 05-23-2012, 06:59 AM   #9
fishinabarrel
Junior Member
 
Location: United States

Join Date: Apr 2011
Posts: 6
Default

I should also add that I installed HTSeq-0.5.3p3 to encounter the qual problem and upon installing HTSeq-0.5.3p4, all was well.
fishinabarrel is offline   Reply With Quote
Old 06-06-2012, 06:09 AM   #10
dharan
Junior Member
 
Location: UK

Join Date: Jan 2012
Posts: 7
Default Problem is still seen in HTSeq - v0.5.3p5

Dear Simon,

I had installed the latest version of HTseq (HTSeq-0.5.3p5.tar.gz) to solve the problem but it looks like for me the error still persists.

I am still facing this error:
Error: ("'seq' and 'qualstr' do not have the same length.", 'line 2671032 of file ..)
[Exception type: ValueError, raised in _HTSeq.pyx:765]

Can you please help me out?

Thanks,
Dharanya
dharan is offline   Reply With Quote
Old 06-06-2012, 06:17 AM   #11
maubp
Peter (Biopython etc)
 
Location: Dundee, Scotland, UK

Join Date: Jul 2009
Posts: 1,541
Default

It would be nice if the HTSeq error message included the two unmatched lengths - but can you show us what line 2671032 of your input file is? This may not be due to the * for missing qualities at all, but a real error in the data.
maubp is offline   Reply With Quote
Old 06-06-2012, 06:56 AM   #12
dharan
Junior Member
 
Location: UK

Join Date: Jan 2012
Posts: 7
Default

Hi,

Here is the line from that file:


HWI-ST790:1:1101:1261:140607#ACTTGA 329 contig_126150 342 3 100M * 0 0 GTCCAGGTTGGTGGACCTCTCAATCATGTTGTCACCCTCAAACCCAGAGATGGGGACGAAGGGAACCTTGTTAGGGTTGTAGCCGACCTTCTTCAGGTAG * AS:i:-7 XN:i:0 XM:i:2 XO:i:0 XG:i:0 NM:i:2 MD:Z:3A67C28 YT:Z:UU NH:i:2 CC:Z:contig_223383 CP:i:208 HI:i:0

Cheers,
Dharanya
dharan is offline   Reply With Quote
Old 06-06-2012, 07:01 AM   #13
maubp
Peter (Biopython etc)
 
Location: Dundee, Scotland, UK

Join Date: Jul 2009
Posts: 1,541
Default

Can you double check which HTSeq you are using? Perhaps an older copy is taking precedence in your PATH, or the update didn't install properly.
maubp is offline   Reply With Quote
Old 06-06-2012, 07:09 AM   #14
dharan
Junior Member
 
Location: UK

Join Date: Jan 2012
Posts: 7
Default

May be there might be a problem with the installation. I will go through it again and let you know if there are any problems still.
Thanks
dharan is offline   Reply With Quote
Old 06-06-2012, 07:10 AM   #15
chadn737
Senior Member
 
Location: US

Join Date: Jan 2009
Posts: 392
Default

As an aside, the latest version of Tophat 2 no longer has the "*" qualities problems.
chadn737 is offline   Reply With Quote
Old 06-07-2012, 06:28 AM   #16
dharan
Junior Member
 
Location: UK

Join Date: Jan 2012
Posts: 7
Default

Quote:
Originally Posted by chadn737 View Post
As an aside, the latest version of Tophat 2 no longer has the "*" qualities problems.
TopHat new version (v2.03) has now solved the problem caused in HTSeq.

Thanks a lot..!!
dharan is offline   Reply With Quote
Old 06-08-2012, 05:18 AM   #17
phred
Member
 
Location: Ireland

Join Date: May 2012
Posts: 11
Default

I am having the same issue with HTSeq dealing with alignments from Tophat 2.

I installed HTSeq/0.5.3p5 but the issue persists. The alignments were done using Tophat 2.0.0

Code:
Error occured in line 36 of file R13a_m_accepted_hits.sam.
Error: ("'seq' and 'qualstr' do not have the same length.", 
'line 36 of file R13a_m_accepted_hits.sam')
[Exception type: ValueError, raised in _HTSeq.pyx:765]
It would like to avoid having to re-align all of my samples with a newer version of Tophat. Any suggestions?

Incidentally, when I was checking the installation as suggested above, the following appears with the v0.5.3p5 of HTSeq:


Code:
>htseq-count
....

>Released under the terms of the GNU General
>Public License v3. Part of the 'HTSeq' framework, version 0.5.3p3.
I presume the footnote was just not updated with the new release?
phred is offline   Reply With Quote
Old 06-08-2012, 05:47 AM   #18
dharan
Junior Member
 
Location: UK

Join Date: Jan 2012
Posts: 7
Default

Hi
As far as I know, I think the only option will be to rerun the TopHat with the new version (v2.3). I think the only problem is dealing with the * qualities in the sam files and that has been resolved in the latest version.
dharan is offline   Reply With Quote
Old 06-11-2012, 12:13 AM   #19
EGrassi
Member
 
Location: Turin, Italy

Join Date: Oct 2010
Posts: 66
Default

Quote:
Originally Posted by phred View Post
Code:
>htseq-count
....

>Released under the terms of the GNU General
>Public License v3. Part of the 'HTSeq' framework, version 0.5.3p3.
I presume the footnote was just not updated with the new release?

Same here. Re-aligning all the reads with the new tophat would be cumbersome, I'll try to dig in the python and find a workaround...
EGrassi is offline   Reply With Quote
Old 06-11-2012, 01:22 AM   #20
Simon Anders
Senior Member
 
Location: Heidelberg, Germany

Join Date: Feb 2010
Posts: 993
Default

Sorry, it seems we made some mix-up between version 0.5.3p4 and 0.5.3p5. Essentially, p5 undid some fixes in p4, including the one for "*" qualities. Now, there is version 0.5.3p6, which should clean up this mess. Please let me know if you still have problems.
Simon Anders is offline   Reply With Quote
Reply

Tags
bowtie2, htseq, quality, sam

Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off




All times are GMT -8. The time now is 11:13 PM.


Powered by vBulletin® Version 3.8.9
Copyright ©2000 - 2018, vBulletin Solutions, Inc.
Single Sign On provided by vBSSO